ChatPaper.aiChatPaper

UCFE:針對大型語言模型的使用者中心金融專業基準

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

October 17, 2024
作者: Yuzhe Yang, Yifei Zhang, Yan Hu, Yilin Guo, Ruoli Gan, Yueru He, Mingcong Lei, Xiao Zhang, Haining Wang, Qianqian Xie, Jimin Huang, Honghai Yu, Benyou Wang
cs.AI

摘要

本文介紹了UCFE:用戶中心金融專業基準,這是一個創新的框架,旨在評估大型語言模型(LLMs)處理複雜現實世界金融任務的能力。UCFE基準採用混合方法,將人類專家評估與動態、任務特定的互動結合,以模擬不斷演變的金融情景的複雜性。首先,我們進行了一項用戶研究,涉及804名參與者,收集了他們對金融任務的反饋。其次,基於這些反饋,我們創建了我們的數據集,涵蓋了各種用戶意圖和互動。這個數據集作為基準12個LLM服務使用LLM作為評判方法的基礎。我們的結果顯示,基準分數與人類偏好之間存在顯著一致性,皮爾遜相關係數為0.78,證實了UCFE數據集和我們的評估方法的有效性。UCFE基準不僅揭示了LLMs在金融領域的潛力,還提供了一個評估它們性能和用戶滿意度的堅實框架。基準數據集和評估代碼可供使用。
English
This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly, we conducted a user study involving 804 participants, collecting their feedback on financial tasks. Secondly, based on this feedback, we created our dataset that encompasses a wide range of user intents and interactions. This dataset serves as the foundation for benchmarking 12 LLM services using the LLM-as-Judge methodology. Our results show a significant alignment between benchmark scores and human preferences, with a Pearson correlation coefficient of 0.78, confirming the effectiveness of the UCFE dataset and our evaluation approach. UCFE benchmark not only reveals the potential of LLMs in the financial sector but also provides a robust framework for assessing their performance and user satisfaction.The benchmark dataset and evaluation code are available.

Summary

AI-Generated Summary

PDF612November 16, 2024