论生成式基础模型的可信度:指南、评估与展望
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
February 20, 2025
作者: Yue Huang, Chujie Gao, Siyuan Wu, Haoran Wang, Xiangqi Wang, Yujun Zhou, Yanbo Wang, Jiayi Ye, Jiawen Shi, Qihui Zhang, Yuan Li, Han Bao, Zhaoyi Liu, Tianrui Guan, Dongping Chen, Ruoxi Chen, Kehan Guo, Andy Zou, Bryan Hooi Kuen-Yew, Caiming Xiong, Elias Stengel-Eskin, Hongyang Zhang, Hongzhi Yin, Huan Zhang, Huaxiu Yao, Jaehong Yoon, Jieyu Zhang, Kai Shu, Kaijie Zhu, Ranjay Krishna, Swabha Swayamdipta, Taiwei Shi, Weijia Shi, Xiang Li, Yiwei Li, Yuexing Hao, Yuexing Hao, Zhihao Jia, Zhize Li, Xiuying Chen, Zhengzhong Tu, Xiyang Hu, Tianyi Zhou, Jieyu Zhao, Lichao Sun, Furong Huang, Or Cohen Sasson, Prasanna Sattigeri, Anka Reuel, Max Lamparth, Yue Zhao, Nouha Dziri, Yu Su, Huan Sun, Heng Ji, Chaowei Xiao, Mohit Bansal, Nitesh V. Chawla, Jian Pei, Jianfeng Gao, Michael Backes, Philip S. Yu, Neil Zhenqiang Gong, Pin-Yu Chen, Bo Li, Xiangliang Zhang
cs.AI
摘要
生成式基础模型(GenFMs)已成为变革性的工具。然而,其广泛应用引发了关于多维度可信度的关键问题。本文通过三项核心贡献,提出了一个全面应对这些挑战的框架。首先,我们系统性地梳理了全球各国政府及监管机构在人工智能治理方面的法律法规,以及行业实践与标准。基于此分析,我们提出了一套针对GenFMs的指导原则,这些原则通过广泛的多学科合作制定,融合了技术、伦理、法律及社会视角。其次,我们推出了TrustGen,这是首个动态基准测试平台,旨在跨多种维度和模型类型(包括文本到图像、大语言模型及视觉语言模型)评估可信度。TrustGen利用模块化组件——元数据整理、测试案例生成及上下文变化——实现自适应和迭代评估,克服了静态评估方法的局限。通过TrustGen,我们揭示了可信度方面的显著进展,同时识别出持续存在的挑战。最后,我们深入探讨了可信GenFMs面临的挑战与未来方向,揭示了可信度复杂且不断演变的本质,强调了效用与可信度之间的微妙权衡,以及针对不同下游应用的考量,识别出持续挑战并为未来研究提供了战略路线图。本工作为推进生成式人工智能的可信度建立了整体框架,为GenFMs更安全、更负责任地融入关键应用铺平了道路。为促进社区进步,我们发布了动态评估工具包。
English
Generative Foundation Models (GenFMs) have emerged as transformative tools.
However, their widespread adoption raises critical concerns regarding
trustworthiness across dimensions. This paper presents a comprehensive
framework to address these challenges through three key contributions. First,
we systematically review global AI governance laws and policies from
governments and regulatory bodies, as well as industry practices and standards.
Based on this analysis, we propose a set of guiding principles for GenFMs,
developed through extensive multidisciplinary collaboration that integrates
technical, ethical, legal, and societal perspectives. Second, we introduce
TrustGen, the first dynamic benchmarking platform designed to evaluate
trustworthiness across multiple dimensions and model types, including
text-to-image, large language, and vision-language models. TrustGen leverages
modular components--metadata curation, test case generation, and contextual
variation--to enable adaptive and iterative assessments, overcoming the
limitations of static evaluation methods. Using TrustGen, we reveal significant
progress in trustworthiness while identifying persistent challenges. Finally,
we provide an in-depth discussion of the challenges and future directions for
trustworthy GenFMs, which reveals the complex, evolving nature of
trustworthiness, highlighting the nuanced trade-offs between utility and
trustworthiness, and consideration for various downstream applications,
identifying persistent challenges and providing a strategic roadmap for future
research. This work establishes a holistic framework for advancing
trustworthiness in GenAI, paving the way for safer and more responsible
integration of GenFMs into critical applications. To facilitate advancement in
the community, we release the toolkit for dynamic evaluation.Summary
AI-Generated Summary