BEATS：面向大型语言模型的偏差评估与测试套件

摘要

本研究提出了BEATS框架，这是一个用于评估大型语言模型（LLMs）中偏见、伦理、公平性和事实性的创新体系。基于BEATS框架，我们构建了一个针对LLMs的偏见基准测试，该测试涵盖29项不同的评估指标。这些指标广泛涉及人口统计、认知和社会偏见等多个维度，同时还包括伦理推理、群体公平性以及与事实性相关的错误信息风险等衡量标准。通过这些指标，我们能够量化评估LLM生成响应可能延续或加剧社会偏见、进而强化系统性不平等的程度。要在该基准测试中取得高分，LLM必须在响应中展现出极高的公平性，这使其成为负责任AI评估的严格标准。根据实验数据的实证结果显示，行业领先模型生成的输出中有37.65%存在某种形式的偏见，凸显了在关键决策系统中使用这些模型的重大风险。BEATS框架及其基准测试提供了一种可扩展且统计严谨的方法论，用于对LLMs进行基准测试、诊断导致偏见的因素，并制定缓解策略。借助BEATS框架，我们的目标是助力开发更具社会责任感和伦理对齐的AI模型。

English

In this research, we introduce BEATS, a novel framework for evaluating Bias, Ethics, Fairness, and Factuality in Large Language Models (LLMs). Building upon the BEATS framework, we present a bias benchmark for LLMs that measure performance across 29 distinct metrics. These metrics span a broad range of characteristics, including demographic, cognitive, and social biases, as well as measures of ethical reasoning, group fairness, and factuality related misinformation risk. These metrics enable a quantitative assessment of the extent to which LLM generated responses may perpetuate societal prejudices that reinforce or expand systemic inequities. To achieve a high score on this benchmark a LLM must show very equitable behavior in their responses, making it a rigorous standard for responsible AI evaluation. Empirical results based on data from our experiment show that, 37.65\% of outputs generated by industry leading models contained some form of bias, highlighting a substantial risk of using these models in critical decision making systems. BEATS framework and benchmark offer a scalable and statistically rigorous methodology to benchmark LLMs, diagnose factors driving biases, and develop mitigation strategies. With the BEATS framework, our goal is to help the development of more socially responsible and ethically aligned AI models.

BEATS：面向大型语言模型的偏差评估与测试套件

BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models

摘要

Summary

Support

Support