ChatPaper.aiChatPaper

CRUST-Bench:C语言到安全Rust转译的全面基准测试平台

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

April 21, 2025
作者: Anirudh Khatry, Robert Zhang, Jia Pan, Ziteng Wang, Qiaochu Chen, Greg Durrett, Isil Dillig
cs.AI

摘要

C到Rust的转译对于现代化遗留C代码、提升安全性以及与现代Rust生态系统的互操作性至关重要。然而,目前尚缺乏一个数据集来评估系统是否能够将C代码转译为通过一系列测试用例的安全Rust代码。我们推出了CRUST-Bench,这是一个包含100个C语言仓库的数据集,每个仓库都配有手动编写的安全Rust接口及用于验证转译正确性的测试用例。通过考虑整个仓库而非孤立函数,CRUST-Bench捕捉了跨多文件依赖的复杂项目翻译的挑战。提供的Rust接口明确了规范,确保遵循惯用且内存安全的Rust模式,而配套的测试用例则强制执行功能正确性。我们评估了当前最先进的大型语言模型(LLMs)在此任务上的表现,发现生成安全且惯用的Rust代码对于多种前沿方法和技术仍具挑战。我们还深入分析了LLMs在从C转译至安全Rust时常见的错误。表现最佳的模型OpenAI o1,在单次尝试设置下仅能解决15个任务。对CRUST-Bench的改进将推动转译系统的发展,使其能够处理复杂场景,并助力将遗留代码库从C迁移至确保内存安全的语言如Rust。数据集与代码可在https://github.com/anirudhkhatry/CRUST-bench获取。
English
C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems. However, no dataset currently exists for evaluating whether a system can transpile C into safe Rust that passes a set of test cases. We introduce CRUST-Bench, a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases that can be used to validate correctness of the transpilation. By considering entire repositories rather than isolated functions, CRUST-Bench captures the challenges of translating complex projects with dependencies across multiple files. The provided Rust interfaces provide explicit specifications that ensure adherence to idiomatic, memory-safe Rust patterns, while the accompanying test cases enforce functional correctness. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem for various state-of-the-art methods and techniques. We also provide insights into the errors LLMs usually make in transpiling code from C to safe Rust. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting. Improvements on CRUST-Bench would lead to improved transpilation systems that can reason about complex scenarios and help in migrating legacy codebases from C into languages like Rust that ensure memory safety. You can find the dataset and code at https://github.com/anirudhkhatry/CRUST-bench.

Summary

AI-Generated Summary

PDF31April 24, 2025