SemViQA：面向越南语信息事实核查的语义问答系统

摘要

大型语言模型（如GPT和Gemini）的兴起加剧了错误信息的传播，这要求我们开发出强有力的核查解决方案，特别是针对越南语等低资源语言。现有方法在处理语义模糊、同音异义词及复杂语言结构时往往力不从心，常常在准确性与效率之间做出妥协。为此，我们推出了SemViQA，一个创新的越南语事实核查框架，它融合了基于语义的证据检索（SER）与两步裁决分类（TVC）技术。我们的方法在精确度与速度之间取得了平衡，在ISE-DSC01数据集上达到了78.97%的严格准确率，在ViWikiFC上更是达到了80.82%，在UIT数据科学挑战赛中荣登榜首。此外，SemViQA Faster版本在保持竞争力的准确率的同时，将推理速度提升了7倍。SemViQA为越南语事实核查设立了新标杆，有力推动了打击错误信息的进程。源代码已发布于：https://github.com/DAVID-NGUYEN-S16/SemViQA。

English

The rise of misinformation, exacerbated by Large Language Models (LLMs) like GPT and Gemini, demands robust fact-checking solutions, especially for low-resource languages like Vietnamese. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01 and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7x while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation. The source code is available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.

SemViQA：面向越南语信息事实核查的语义问答系统

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

摘要

Summary

Support