從CISC到RISC:語言模型引導的組合語言轉譯
From CISC to RISC: language-model guided assembly transpilation
November 25, 2024
作者: Ahmed Heakl, Chaimaa Abi, Rania Hossam, Abdulrahman Mahmoud
cs.AI
摘要
從 x86 架構轉換到 ARM 架構在各個領域中變得越來越普遍,主要是由 ARM 的能源效率和在傳統領域中性能的提升所推動。然而,這種指令集架構的轉換帶來了重大挑戰,主要是由於 x86 軟體的龐大遺留生態系統以及在專有生態系統和軟體堆疊之間的可移植性不足。本文介紹了 CRT,一種基於 LLM 輕量級的編譯器,可以自動將 x86 組合語言轉換為 ARM 組合語言。我們的方法彌合了 x86 的 CISC 架構和 ARM 的 RISC 架構之間的基本差距,同時保留了程式語義並優化了性能。我們在各種真實應用中評估了 CRT,在我們全面的測試套件上實現了從 x86 到 ARMv5 的 79.25% 翻譯準確率,以及從 x86 到 RISC-V 的 88.68% 準確率。在 Apple M2 硬體(ARMv8)上的實際部署中,我們的轉譯程式碼相較於 Apple 的 Rosetta 2 虛擬化引擎實現了 1.73 倍的加速,同時提供了 2.41 倍的記憶體效率和 1.47 倍的更好能源消耗。通過測試和分析,我們展示了 CRT 成功地跨越了 CISC/RISC 分歧,並在機器“語言”障礙下生成了正確可執行的 RISC 代碼。我們在以下網址釋出我們的程式碼、模型、訓練數據集和基準測試:https://ahmedheakl.github.io/asm2asm/。
English
The transition from x86 to ARM architecture is becoming increasingly common
across various domains, primarily driven by ARM's energy efficiency and
improved performance across traditional sectors. However, this ISA shift poses
significant challenges, mainly due to the extensive legacy ecosystem of x86
software and lack of portability across proprietary ecosystems and software
stacks. This paper introduces CRT, a lightweight LLM-based transpiler that
automatically converts x86 assembly to ARM assembly. Our approach bridges the
fundamental architectural gap between x86's CISC-based and ARM's RISC-based
computing paradigms while preserving program semantics and optimizing
performance. We evaluate CRT on diverse real-world applications, achieving
79.25% translation accuracy from x86 to ARMv5 on our comprehensive test suite,
and an 88.68% accuracy from x86 to RISC-V. In practical deployments on Apple M2
hardware (ARMv8), our transpiled code achieves 1.73times speedup compared to
Apple's Rosetta 2 virtualization engine, while delivering 2.41times memory
efficiency and 1.47times better energy consumption. Through testing and
analysis, we show that CRT successfully navigates the CISC/RISC divide and
generates correctly executable RISC code despite machine ``language'' barriers.
We release our code, models, training datasets, and benchmarks at:
https://ahmedheakl.github.io/asm2asm/.Summary
AI-Generated Summary