基于神经网络的数据估值助力高效指令微调

摘要

影响函数为模型训练提供了关键洞见，但现有方法存在计算成本高、泛化能力有限的问题。特别是，近期研究提出了多种利用语言模型计算数据影响的指标和算法，这些方法在面对大规模模型和数据集时难以有效扩展。原因在于计算过程中昂贵的前向和反向传播、存储大型模型所需的大量内存，以及影响估计对新数据的泛化能力不足。本文探索使用小型神经网络——我们称之为影响网络（InfluenceNetwork）——来估计影响值，实现了高达99%的成本降低。我们的评估表明，仅需使用完整语言模型（我们采用7B和8B版本）0.0027%大小的模型即可估计影响值。我们将这一估计影响值的算法（称为NN-CIFT：高效指令微调的神经网络）应用于下游任务，即通用指令微调的子集选择。研究中，我们纳入了四种最先进的影响函数，并展示了NN-CIFT在保持性能的同时实现了显著加速，与原始影响函数相比无性能损失。我们对NN-CIFT进行了深入的超参数分析。本方法的代码可在此处获取：https://github.com/agarwalishika/NN-CIFT。

English

Influence functions provide crucial insights into model training, but existing methods suffer from large computational costs and limited generalization. Particularly, recent works have proposed various metrics and algorithms to calculate the influence of data using language models, which do not scale well with large models and datasets. This is because of the expensive forward and backward passes required for computation, substantial memory requirements to store large models, and poor generalization of influence estimates to new data. In this paper, we explore the use of small neural networks -- which we refer to as the InfluenceNetwork -- to estimate influence values, achieving up to 99% cost reduction. Our evaluation demonstrates that influence values can be estimated with models just 0.0027% the size of full language models (we use 7B and 8B versions). We apply our algorithm of estimating influence values (called NN-CIFT: Neural Networks for effiCient Instruction Fine-Tuning) to the downstream task of subset selection for general instruction fine-tuning. In our study, we include four state-of-the-art influence functions and show no compromise in performance, despite large speedups, between NN-CIFT and the original influence functions. We provide an in-depth hyperparameter analyses of NN-CIFT. The code for our method can be found here: https://github.com/agarwalishika/NN-CIFT.

基于神经网络的数据估值助力高效指令微调

Data Valuation using Neural Networks for Efficient Instruction Fine-Tuning

摘要

Summary

Support