ChatPaper.aiChatPaper

从 CISC 到 RISC:语言模型引导的汇编转译

From CISC to RISC: language-model guided assembly transpilation

November 25, 2024
作者: Ahmed Heakl, Chaimaa Abi, Rania Hossam, Abdulrahman Mahmoud
cs.AI

摘要

从x86架构向ARM架构的过渡在各个领域变得越来越普遍,主要是由ARM的能效和在传统领域中性能的提升所推动。然而,这种ISA转变带来了重大挑战,主要是由于x86软件的庞大遗留生态系统以及在专有生态系统和软件堆栈之间缺乏可移植性。本文介绍了CRT,一种基于LLM的轻量级转译器,能够自动将x86汇编代码转换为ARM汇编代码。我们的方法弥合了x86基于CISC和ARM基于RISC的计算范式之间的根本架构差距,同时保留了程序语义并优化了性能。我们在各种真实应用程序上评估了CRT,在我们的全面测试套件上实现了从x86到ARMv5的79.25%翻译准确率,以及从x86到RISC-V的88.68%准确率。在Apple M2硬件(ARMv8)上的实际部署中,我们的转译代码相对于Apple的Rosetta 2虚拟化引擎实现了1.73倍的加速,同时提供了2.41倍的内存效率和1.47倍的能源消耗改进。通过测试和分析,我们展示了CRT成功地跨越了CISC/RISC之间的鸿沟,并生成了正确可执行的RISC代码,尽管存在机器“语言”障碍。我们在以下网址发布了我们的代码、模型、训练数据集和基准测试:https://ahmedheakl.github.io/asm2asm/。
English
The transition from x86 to ARM architecture is becoming increasingly common across various domains, primarily driven by ARM's energy efficiency and improved performance across traditional sectors. However, this ISA shift poses significant challenges, mainly due to the extensive legacy ecosystem of x86 software and lack of portability across proprietary ecosystems and software stacks. This paper introduces CRT, a lightweight LLM-based transpiler that automatically converts x86 assembly to ARM assembly. Our approach bridges the fundamental architectural gap between x86's CISC-based and ARM's RISC-based computing paradigms while preserving program semantics and optimizing performance. We evaluate CRT on diverse real-world applications, achieving 79.25% translation accuracy from x86 to ARMv5 on our comprehensive test suite, and an 88.68% accuracy from x86 to RISC-V. In practical deployments on Apple M2 hardware (ARMv8), our transpiled code achieves 1.73times speedup compared to Apple's Rosetta 2 virtualization engine, while delivering 2.41times memory efficiency and 1.47times better energy consumption. Through testing and analysis, we show that CRT successfully navigates the CISC/RISC divide and generates correctly executable RISC code despite machine ``language'' barriers. We release our code, models, training datasets, and benchmarks at: https://ahmedheakl.github.io/asm2asm/.

Summary

AI-Generated Summary

PDF157November 26, 2024