指令引导的自回归神经网络参数生成
Instruction-Guided Autoregressive Neural Network Parameter Generation
April 2, 2025
作者: Soro Bedionita, Bruno Andreis, Song Chong, Sung Ju Hwang
cs.AI
摘要
学习根据任务描述和架构规范生成神经网络参数,对于提升模型适应性和迁移学习能力至关重要。现有方法,尤其是基于扩散模型的技术,在扩展到大型架构时面临可扩展性限制,处理不同网络深度时缺乏灵活性,且参数生成过程割裂,损害了层间一致性。本研究提出IGPG(指令引导参数生成),一种自回归框架,统一了跨多样任务和架构的参数合成。IGPG利用VQ-VAE和自回归模型,在任务指令、数据集及架构细节的指导下生成神经网络参数。通过自回归地生成神经网络权重的token,IGPG确保了层间一致性,并实现了跨模型和数据集的高效适应。在token级别操作,IGPG有效捕捉了从广泛预训练模型中汇总的复杂参数分布。在多个视觉数据集上的大量实验表明,IGPG将多样化的预训练模型整合进一个灵活生成框架中。相较于最先进方法,合成参数在性能上达到竞争或超越水平,特别是在应用于大型架构时的可扩展性和效率方面。这些成果凸显了IGPG作为预训练权重检索、模型选择及快速任务特定微调强大工具的潜力。
English
Learning to generate neural network parameters conditioned on task
descriptions and architecture specifications is pivotal for advancing model
adaptability and transfer learning. Existing methods especially those based on
diffusion models suffer from limited scalability to large architectures,
rigidity in handling varying network depths, and disjointed parameter
generation that undermines inter-layer coherence. In this work, we propose IGPG
(Instruction Guided Parameter Generation), an autoregressive framework that
unifies parameter synthesis across diverse tasks and architectures. IGPG
leverages a VQ-VAE and an autoregressive model to generate neural network
parameters, conditioned on task instructions, dataset, and architecture
details. By autoregressively generating neural network weights' tokens, IGPG
ensures inter-layer coherence and enables efficient adaptation across models
and datasets. Operating at the token level, IGPG effectively captures complex
parameter distributions aggregated from a broad spectrum of pretrained models.
Extensive experiments on multiple vision datasets demonstrate that IGPG
consolidates diverse pretrained models into a single, flexible generative
framework. The synthesized parameters achieve competitive or superior
performance relative to state-of-the-art methods, especially in terms of
scalability and efficiency when applied to large architectures. These results
underscore ICPG potential as a powerful tool for pretrained weight retrieval,
model selection, and rapid task-specific fine-tuning.Summary
AI-Generated Summary