能源高效的蛋白質語言模型:利用 LoRA 的小型語言模型進行可控蛋白質生成

Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation

November 8, 2024
作者: Aayush Shah, Shankar Jayaratnam
cs.AI

摘要

大型語言模型(LLMs)在自然語言處理(NLP)任務中取得了顯著成功,並在其他領域(如蛋白質序列生成)中展現了令人期待的成果。然而,用於NLP的LLMs與蛋白質語言模型之間仍存在顯著差異,NLP中的LLMs能夠有效處理多個任務並以較小的尺寸提供,而蛋白質語言模型則常專門為特定任務而設計,並且僅以較大的尺寸存在。在本研究中,我們介紹了兩個小型蛋白質語言模型,基於Llama-3-8B和Phi-3-mini,能夠進行不可控和可控蛋白質生成。對於不可控生成任務,我們的最佳模型實現了平均pLDDT分數為69.75,展示了在生成可行蛋白質結構方面的穩健表現。對於可控生成任務,模型根據提示中指定的屬性生成蛋白質,在這方面,我們實現了顯著的平均TM-Score為0.84,表明與目標蛋白質具有高結構相似性。我們選擇了10個屬性,包括六類酶,以擴展先前蛋白質語言模型的能力。我們的方法利用了低秩適配器(LoRA)技術,將可訓練參數減少到原始模型尺寸的4%,降低了計算要求。通過使用UniRef50數據集的子集和小型模型,我們將整體訓練時間減少了70%,同時不影響性能。值得注意的是,Phi-3-mini將可訓練參數減少了60%,相較於Llama 3,訓練成本降低了30%。因此,Phi-3實現了可比的TM-Score為0.81,表明較小的模型可以達到與較大模型(如Llama 3)相匹配的性能。我們還展示了在節能高效的ET-SoC-1芯片上部署我們的模型,將TPS/W顯著提高了3倍。
English
Large language models (LLMs) have demonstrated significant success in natural language processing (NLP) tasks and have shown promising results in other domains such as protein sequence generation. However, there remain salient differences between LLMs used for NLP, which effectively handle multiple tasks and are available in small sizes, and protein language models that are often specialized for specific tasks and only exist in larger sizes. In this work, we introduce two small protein language models, based on Llama-3-8B and Phi-3-mini, that are capable of both uncontrollable and controllable protein generation. For the uncontrollable generation task, our best model achieves an average pLDDT score of 69.75, demonstrating robust performance in generating viable protein structures. For the controllable generation task, in which the model generates proteins according to properties specified in the prompt, we achieve a remarkable average TM-Score of 0.84, indicating high structural similarity to target proteins. We chose 10 properties, including six classes of enzymes, to extend the capabilities of prior protein language models. Our approach utilizes the Low-Rank Adaptor (LoRA) technique, reducing trainable parameters to just 4% of the original model size, lowering computational requirements. By using a subset of the UniRef50 dataset and small models, we reduced the overall training time by 70% without compromising performance. Notably, Phi-3-mini reduced trainable parameters by 60%, decreasing training cost by 30% compared to Llama 3. Consequently, Phi-3 achieved a comparable TM-Score of 0.81, demonstrating that smaller models can match the performance of larger ones, like Llama 3. We also demonstrate the deployment of our models on the energy efficient ET-SoC-1 chip, significantly improving the TPS/W by a factor of 3.

Summary

AI-Generated Summary

PDF42November 13, 2024