ChatPaper.aiChatPaper

SwiLTra-Bench:瑞士法律翻译基准测试

SwiLTra-Bench: The Swiss Legal Translation Benchmark

March 3, 2025
作者: Joel Niklaus, Jakob Merane, Luka Nenadic, Sina Ahmadi, Yingqiang Gao, Cyrill A. H. Chevalley, Claude Humbel, Christophe Gösken, Lorenzo Tanzi, Thomas Lüthi, Stefan Palombo, Spencer Poff, Boling Yang, Nan Wu, Matthew Guillod, Robin Mamié, Daniel Brunner, Julio Pereyra, Niko Grupen
cs.AI

摘要

在瑞士,法律翻译具有独特的重要性,这源于该国四种官方语言及对多语种法律文件的要求。然而,这一过程传统上依赖于既需精通法律又擅长翻译的专业人士,导致效率瓶颈,影响了司法公正的有效实现。为应对这一挑战,我们推出了SwiLTra-Bench,这是一个包含超过18万条对齐的瑞士法律翻译对的多语言基准数据集,涵盖所有瑞士官方语言及英语的法律条文、摘要和新闻稿,旨在评估基于大语言模型(LLM)的翻译系统。我们的系统评估显示,前沿模型在所有文档类型上均展现出卓越的翻译性能,而专门化翻译系统虽在法律条文上表现突出,但在摘要翻译上则稍显逊色。通过严格测试与人类专家验证,我们证实,尽管对开源SLM进行微调能显著提升其翻译质量,它们仍落后于如Claude-3.5-Sonnet等最佳零样本提示的前沿模型。此外,我们还介绍了SwiLTra-Judge,一个与人类专家评估最为契合的专门化LLM评价系统。
English
In Switzerland legal translation is uniquely important due to the country's four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators -- creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.

Summary

AI-Generated Summary

PDF32March 6, 2025