ChatPaper.aiChatPaper

SemViQA:面向越南语信息事实核查的语义问答系统

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

March 2, 2025
作者: Nam V. Nguyen, Dien X. Tran, Thanh T. Tran, Anh T. Hoang, Tai V. Duong, Di T. Le, Phuc-Lu Le
cs.AI

摘要

大型语言模型(如GPT和Gemini)的兴起加剧了错误信息的传播,这要求我们开发出强有力的核查解决方案,特别是针对越南语等低资源语言。现有方法在处理语义模糊、同音异义词及复杂语言结构时往往力不从心,常常在准确性与效率之间做出妥协。为此,我们推出了SemViQA,一个创新的越南语事实核查框架,它融合了基于语义的证据检索(SER)与两步裁决分类(TVC)技术。我们的方法在精确度与速度之间取得了平衡,在ISE-DSC01数据集上达到了78.97%的严格准确率,在ViWikiFC上更是达到了80.82%,在UIT数据科学挑战赛中荣登榜首。此外,SemViQA Faster版本在保持竞争力的准确率的同时,将推理速度提升了7倍。SemViQA为越南语事实核查设立了新标杆,有力推动了打击错误信息的进程。源代码已发布于:https://github.com/DAVID-NGUYEN-S16/SemViQA。
English
The rise of misinformation, exacerbated by Large Language Models (LLMs) like GPT and Gemini, demands robust fact-checking solutions, especially for low-resource languages like Vietnamese. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01 and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7x while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation. The source code is available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.

Summary

AI-Generated Summary

PDF262March 5, 2025