ChatPaper.aiChatPaper

微调小型语言模型用于特定领域AI:边缘AI视角

Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective

March 3, 2025
作者: Rakshit Aralimatti, Syed Abdul Gaffar Shakhadri, Kruthika KR, Kartik Basavaraj Angadi
cs.AI

摘要

在边缘设备上部署大规模语言模型面临着固有的挑战,如高计算需求、能源消耗以及潜在的数据隐私风险。本文介绍了Shakti小型语言模型(SLMs)系列——Shakti-100M、Shakti-250M和Shakti-500M,这些模型直接针对上述限制进行了优化。通过结合高效架构、量化技术及负责任的AI原则,Shakti系列为智能手机、智能家电、物联网系统等提供了设备端智能支持。我们深入探讨了其设计理念、训练流程,以及在通用任务(如MMLU、Hellaswag)和特定领域(医疗、金融、法律)上的基准性能。研究结果表明,经过精心工程设计和微调的紧凑模型,在实际边缘AI场景中不仅能够满足,甚至常常超越预期表现。
English
Deploying large scale language models on edge devices faces inherent challenges such as high computational demands, energy consumption, and potential data privacy risks. This paper introduces the Shakti Small Language Models (SLMs) Shakti-100M, Shakti-250M, and Shakti-500M which target these constraints headon. By combining efficient architectures, quantization techniques, and responsible AI principles, the Shakti series enables on-device intelligence for smartphones, smart appliances, IoT systems, and beyond. We provide comprehensive insights into their design philosophy, training pipelines, and benchmark performance on both general tasks (e.g., MMLU, Hellaswag) and specialized domains (healthcare, finance, and legal). Our findings illustrate that compact models, when carefully engineered and fine-tuned, can meet and often exceed expectations in real-world edge-AI scenarios.

Summary

AI-Generated Summary

PDF113March 6, 2025