CRUST-Bench:C语言到安全Rust转译的全面基准测试平台
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
April 21, 2025
作者: Anirudh Khatry, Robert Zhang, Jia Pan, Ziteng Wang, Qiaochu Chen, Greg Durrett, Isil Dillig
cs.AI
摘要
C到Rust的轉譯對於現代化遺留C代碼至關重要,同時也能增強安全性並與現代Rust生態系統實現互操作性。然而,目前尚無數據集可用於評估系統是否能將C轉譯為通過一系列測試用例的安全Rust代碼。我們引入了CRUST-Bench,這是一個包含100個C代碼庫的數據集,每個代碼庫都配備了手動編寫的安全Rust接口以及可用於驗證轉譯正確性的測試用例。通過考慮整個代碼庫而非孤立函數,CRUST-Bench捕捉了翻譯具有跨文件依賴關係的複雜項目所面臨的挑戰。提供的Rust接口明確了規範,確保遵循慣用的、內存安全的Rust模式,而配套的測試用例則強制執行功能正確性。我們在此任務上評估了最先進的大型語言模型(LLMs),發現生成安全且慣用的Rust代碼對於各種最先進的方法和技術而言仍是一個難題。我們還深入分析了LLMs在將C代碼轉譯為安全Rust時通常會犯的錯誤。表現最佳的模型OpenAI o1,在單次嘗試設置下僅能解決15個任務。對CRUST-Bench的改進將推動轉譯系統的進步,使其能夠推理複雜場景,並協助將遺留代碼庫從C遷移到確保內存安全的語言如Rust中。您可以在https://github.com/anirudhkhatry/CRUST-bench找到數據集和代碼。
English
C-to-Rust transpilation is essential for modernizing legacy C code while
enhancing safety and interoperability with modern Rust ecosystems. However, no
dataset currently exists for evaluating whether a system can transpile C into
safe Rust that passes a set of test cases. We introduce CRUST-Bench, a dataset
of 100 C repositories, each paired with manually-written interfaces in safe
Rust as well as test cases that can be used to validate correctness of the
transpilation. By considering entire repositories rather than isolated
functions, CRUST-Bench captures the challenges of translating complex projects
with dependencies across multiple files. The provided Rust interfaces provide
explicit specifications that ensure adherence to idiomatic, memory-safe Rust
patterns, while the accompanying test cases enforce functional correctness. We
evaluate state-of-the-art large language models (LLMs) on this task and find
that safe and idiomatic Rust generation is still a challenging problem for
various state-of-the-art methods and techniques. We also provide insights into
the errors LLMs usually make in transpiling code from C to safe Rust. The best
performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot
setting. Improvements on CRUST-Bench would lead to improved transpilation
systems that can reason about complex scenarios and help in migrating legacy
codebases from C into languages like Rust that ensure memory safety. You can
find the dataset and code at https://github.com/anirudhkhatry/CRUST-bench.Summary
AI-Generated Summary