CritiQ：基于人类偏好的数据质量准则挖掘

摘要

语言模型的高效运行高度依赖于优质数据。现有方法依赖于人工设计的启发式规则、现有模型的困惑度、训练分类器或精心设计的提示工程，这些方法不仅需要丰富的专家经验和大量的人工标注工作，还容易引入偏差。我们提出了CritiQ，一种新颖的数据选择方法，它仅需30对人工标注样本即可自动从人类偏好中挖掘数据质量标准，并实现高效的数据筛选。其核心组件CritiQ Flow采用一个管理代理来演化质量标准，并利用多个工作代理进行成对判断。我们构建了一个知识库，从先前工作中提取质量标准，以增强CritiQ Flow的性能。相较于基于困惑度和分类器的方法，语言描述的标准更具可解释性，且具备复用价值。在确定标准后，我们训练CritiQ评分器来赋予数据质量分数，并执行高效的数据选择。我们在代码、数学和逻辑领域验证了该方法的有效性，在人工标注的测试集上达到了高准确率。为了验证所选数据质量，我们持续训练Llama 3.1模型，并观察到在下游任务上的性能相较于均匀采样有所提升。消融实验验证了知识库和反思过程带来的益处。我们还分析了标准如何演化以及多数投票的有效性。

English

Language model heavily depends on high-quality data for optimal performance. Existing approaches rely on manually designed heuristics, the perplexity of existing models, training classifiers, or careful prompt engineering, which require significant expert experience and human annotation effort while introduce biases. We introduce CritiQ, a novel data selection method that automatically mines criteria from human preferences for data quality with only sim30 human-annotated pairs and performs efficient data selection. The main component, CritiQ Flow, employs a manager agent to evolve quality criteria and worker agents to make pairwise judgments. We build a knowledge base that extracts quality criteria from previous work to boost CritiQ Flow. Compared to perplexity- and classifier- based methods, verbal criteria are more interpretable and possess reusable value. After deriving the criteria, we train the CritiQ Scorer to give quality scores and perform efficient data selection. We demonstrate the effectiveness of our method in the code, math, and logic domains, achieving high accuracy on human-annotated test sets. To validate the quality of the selected data, we continually train Llama 3.1 models and observe improved performance on downstream tasks compared to uniform sampling. Ablation studies validate the benefits of the knowledge base and the reflection process. We analyze how criteria evolve and the effectiveness of majority voting.

CritiQ：基于人类偏好的数据质量准则挖掘

CritiQ: Mining Data Quality Criteria from Human Preferences

摘要

Summary

Support

Support