ChatPaper.aiChatPaper

FinMTEB:金融领域大规模文本嵌入基准

FinMTEB: Finance Massive Text Embedding Benchmark

February 16, 2025
作者: Yixuan Tang, Yi Yang
cs.AI

摘要

嵌入模型在各类自然语言处理(NLP)应用中扮演着至关重要的角色,用于信息的表示与检索。随着大语言模型(LLMs)的近期进展,嵌入模型的性能得到了进一步提升。尽管这些模型通常在通用数据集上进行基准测试,但实际应用场景要求针对特定领域进行评估。本研究中,我们引入了金融大规模文本嵌入基准(FinMTEB),作为MTEB在金融领域的专门对应版本。FinMTEB包含64个金融领域特定的嵌入数据集,涵盖7项任务,涉及中英文两种语言的多种文本类型,如金融新闻文章、公司年报、ESG报告、监管文件及财报电话会议记录。此外,我们采用基于人物角色的数据合成方法,开发了一款金融适配模型——FinPersona-E5,以覆盖多样化的金融嵌入任务进行训练。通过对包括FinPersona-E5在内的15种嵌入模型进行广泛评估,我们揭示了三个关键发现:(1)在通用基准上的表现与金融领域任务的相关性有限;(2)领域适配模型持续优于其通用版本;(3)令人意外的是,在金融语义文本相似性(STS)任务中,简单的词袋(BoW)方法超越了复杂的密集嵌入技术,凸显了当前密集嵌入方法的局限性。本研究为金融NLP应用建立了一个坚实的评估框架,并为开发领域特定的嵌入模型提供了重要洞见。
English
Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in large language models (LLMs) have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, FinPersona-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including FinPersona-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

Summary

AI-Generated Summary

PDF32February 19, 2025