SemViQA:面向越南语信息事实核查的语义问答系统
SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking
March 2, 2025
作者: Nam V. Nguyen, Dien X. Tran, Thanh T. Tran, Anh T. Hoang, Tai V. Duong, Di T. Le, Phuc-Lu Le
cs.AI
摘要
大型语言模型(如GPT和Gemini)的兴起加剧了错误信息的传播,这要求我们开发出强有力的核查解决方案,特别是针对越南语等低资源语言。现有方法在处理语义模糊、同音异义词及复杂语言结构时往往力不从心,常常在准确性与效率之间做出妥协。为此,我们推出了SemViQA,一个创新的越南语事实核查框架,它融合了基于语义的证据检索(SER)与两步裁决分类(TVC)技术。我们的方法在精确度与速度之间取得了平衡,在ISE-DSC01数据集上达到了78.97%的严格准确率,在ViWikiFC上更是达到了80.82%,在UIT数据科学挑战赛中荣登榜首。此外,SemViQA Faster版本在保持竞争力的准确率的同时,将推理速度提升了7倍。SemViQA为越南语事实核查设立了新标杆,有力推动了打击错误信息的进程。源代码已发布于:https://github.com/DAVID-NGUYEN-S16/SemViQA。
English
The rise of misinformation, exacerbated by Large Language Models (LLMs) like
GPT and Gemini, demands robust fact-checking solutions, especially for
low-resource languages like Vietnamese. Existing methods struggle with semantic
ambiguity, homonyms, and complex linguistic structures, often trading accuracy
for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking
framework integrating Semantic-based Evidence Retrieval (SER) and Two-step
Verdict Classification (TVC). Our approach balances precision and speed,
achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01
and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge.
Additionally, SemViQA Faster improves inference speed 7x while maintaining
competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact
verification, advancing the fight against misinformation. The source code is
available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.Summary
AI-Generated Summary