Fino1:论推理增强LLM在金融领域的可迁移性
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
February 12, 2025
作者: Lingfei Qian, Weipeng Zhou, Yan Wang, Xueqing Peng, Jimin Huang, Qianqian Xie
cs.AI
摘要
最近大型语言模型(LLMs)的进展展示出强大的一般推理能力,然而它们在金融推理方面的有效性尚未得到充分探讨。在本研究中,我们全面评估了16个强大的推理和通用LLMs在涉及金融文本、表格数据和方程的三个复杂金融任务上的表现,评估了数字推理、表格解释、金融术语理解、长文本处理和基于方程的问题解决能力。我们的结果表明,尽管更好的数据集和预训练可以改善金融推理,但像CoT微调这样的通用增强并不总是带来一致的收益。此外,所有推理策略在提高长文本和多表任务的性能方面都面临挑战。为了解决这些限制,我们基于Llama-3.1-8B-Instruct开发了一个金融推理增强模型,通过CoT微调和强化学习与特定领域推理路径相结合。即使只是对一个金融数据集进行简单微调,我们的模型在各项任务上都实现了一致的10%性能提升,超越了所有8B模型,甚至在平均水平上超过了Llama3-70B-Instruct和Llama3.1-70B-Instruct。我们的结果突显了金融任务中领域特定适应性的必要性,强调未来方向,如多表推理、长文本处理和金融术语理解。我们所有的数据集、模型和代码都是公开可用的。此外,我们引入了一个用于基准测试未来数据集和模型的排行榜。
English
Recent advancements in large language models (LLMs) have shown strong general
reasoning abilities, yet their effectiveness in financial reasoning remains
underexplored. In this study, we comprehensively evaluate 16 powerful reasoning
and general LLMs on three complex financial tasks involving financial text,
tabular data, and equations, assessing numerical reasoning, tabular
interpretation, financial terminology comprehension, long-context processing,
and equation-based problem solving. Our results show that while better datasets
and pretraining improve financial reasoning, general enhancements like CoT
fine-tuning do not always yield consistent gains. Moreover, all reasoning
strategies face challenges in improving performance on long-context and
multi-table tasks. To address these limitations, we develop a financial
reasoning-enhanced model based on Llama-3.1-8B-Instruct, by CoT fine-tuning and
reinforcement learning with domain-specific reasoning paths. Even with simple
fine-tuning with one financial dataset, our model achieves a consistent 10%
performance improvement across tasks, surpassing all 8B models and even
Llama3-70B-Instruct and Llama3.1-70B-Instruct on average. Our results highlight
the need for domain-specific adaptations in financial tasks, emphasizing future
directions such as multi-table reasoning, long-context processing, and
financial terminology comprehension. All our datasets, models, and codes are
publicly available. Furthermore, we introduce a leaderboard for benchmarking
future datasets and models.Summary
AI-Generated Summary