Upload 20 files
Browse files- .gitattributes +9 -0
- Images/.frame2_cn.png +3 -0
- Images/.frame_cn.png +0 -0
- Images/.frame_cn1.png +3 -0
- Images/.frame_cn2.png +3 -0
- Images/Data_distribution_en.png +3 -0
- Images/ESG.gif +3 -0
- Images/English_Financial_Calculations.gif +3 -0
- Images/Fin-R1-pipeline_en.png +3 -0
- Images/Financial_Calculations.gif +3 -0
- Images/Financial_Code.gif +3 -0
- Images/Financial_Security_and_Compliance.gif +3 -0
- Images/Intelligent_Risk_Control.gif +3 -0
- Images/README +1 -0
- Images/data_construct.png +3 -0
- Images/frame_cn.png +0 -0
- Images/title.png +0 -0
- Images/trainning.png +3 -0
- README.md +203 -0
- README_en.md +200 -0
- Technical_report.pdf +3 -0
.gitattributes
CHANGED
@@ -43,3 +43,12 @@ Images/英文金融.gif filter=lfs diff=lfs merge=lfs -text
|
|
43 |
Images/data_construct.png filter=lfs diff=lfs merge=lfs -text
|
44 |
Images/ESG.gif filter=lfs diff=lfs merge=lfs -text
|
45 |
Images/trainning.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
Images/data_construct.png filter=lfs diff=lfs merge=lfs -text
|
44 |
Images/ESG.gif filter=lfs diff=lfs merge=lfs -text
|
45 |
Images/trainning.png filter=lfs diff=lfs merge=lfs -text
|
46 |
+
Images/.frame_cn2.png filter=lfs diff=lfs merge=lfs -text
|
47 |
+
Images/Data_distribution_en.png filter=lfs diff=lfs merge=lfs -text
|
48 |
+
Images/English_Financial_Calculations.gif filter=lfs diff=lfs merge=lfs -text
|
49 |
+
Images/Fin-R1-pipeline_en.png filter=lfs diff=lfs merge=lfs -text
|
50 |
+
Images/Financial_Calculations.gif filter=lfs diff=lfs merge=lfs -text
|
51 |
+
Images/Financial_Code.gif filter=lfs diff=lfs merge=lfs -text
|
52 |
+
Images/Financial_Security_and_Compliance.gif filter=lfs diff=lfs merge=lfs -text
|
53 |
+
Images/Intelligent_Risk_Control.gif filter=lfs diff=lfs merge=lfs -text
|
54 |
+
Technical_report.pdf filter=lfs diff=lfs merge=lfs -text
|
Images/.frame2_cn.png
ADDED
![]() |
Git LFS Details
|
Images/.frame_cn.png
ADDED
![]() |
Images/.frame_cn1.png
ADDED
![]() |
Git LFS Details
|
Images/.frame_cn2.png
ADDED
![]() |
Git LFS Details
|
Images/Data_distribution_en.png
ADDED
![]() |
Git LFS Details
|
Images/ESG.gif
ADDED
![]() |
Git LFS Details
|
Images/English_Financial_Calculations.gif
ADDED
![]() |
Git LFS Details
|
Images/Fin-R1-pipeline_en.png
ADDED
![]() |
Git LFS Details
|
Images/Financial_Calculations.gif
ADDED
![]() |
Git LFS Details
|
Images/Financial_Code.gif
ADDED
![]() |
Git LFS Details
|
Images/Financial_Security_and_Compliance.gif
ADDED
![]() |
Git LFS Details
|
Images/Intelligent_Risk_Control.gif
ADDED
![]() |
Git LFS Details
|
Images/README
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
图片
|
Images/data_construct.png
ADDED
![]() |
Git LFS Details
|
Images/frame_cn.png
ADDED
![]() |
Images/title.png
ADDED
![]() |
Images/trainning.png
ADDED
![]() |
Git LFS Details
|
README.md
ADDED
@@ -0,0 +1,203 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
<img src="Images/title.png" width="700" height="200">
|
3 |
+
</div>
|
4 |
+
<div align="center">
|
5 |
+
<h1>Fin-R1:通过强化学习驱动的金融推理大模型</h1>
|
6 |
+
|
7 |
+
<!-- 徽章部分 -->
|
8 |
+
[](https://www.apache.org/licenses/LICENSE-2.0)
|
9 |
+
[](https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1)
|
10 |
+
[](https://arxiv.org/abs/2503.16252)
|
11 |
+
|
12 |
+
<!-- 语言切换链接 -->
|
13 |
+
📄 [中文](./README.md) | [EN](./README_en.md)
|
14 |
+
</div>
|
15 |
+
|
16 |
+
Fin-R1 是一款针对金融领域复杂推理的大型语言模型,由上海财经大学统计与数据科学学院张立文教授与其领衔的金融大语言模型课题组(SUFE-AIFLM-Lab)联合财跃星辰研发并开源发布。该模型以 Qwen2.5-7B-Instruct 为基座,通过高质量的可验证金融问题微调训练,最终表现在多个金融领域基准测试上的表现达到参评模型的SOTA水平。
|
17 |
+
|
18 |
+
|
19 |
+
|
20 |
+
## 📌 目录<a name="toc"></a>
|
21 |
+
- [场景应用](#summary)
|
22 |
+
- [金融代码](#金融代码)
|
23 |
+
- [金融计算](#金融计算)
|
24 |
+
- [英语金融计算](#英语金融计算)
|
25 |
+
- [金融安全合规](#金融安全合规)
|
26 |
+
- [智能风控](#智能风控)
|
27 |
+
- [ESG分析](#ESG分析)
|
28 |
+
- [总体工作流程](#总体工作流程)
|
29 |
+
- [数据构建](#data)
|
30 |
+
- [微调训练](#trainning)
|
31 |
+
- [模型评测结果](#results)
|
32 |
+
- [模型使用方法](#use)
|
33 |
+
- [未来展望](#todo)
|
34 |
+
- [联系我们](#connection)
|
35 |
+
|
36 |
+
## 💡 场景应用 <a name="summary"></a>
|
37 |
+
Fin-R1 是一款专为金融推理领域设计的大语言模型,采用轻量化的 7B 参数量级架构。在显著降低部署成本的同时,该模型通过在针对金融推理场景的高质量思维链数据上采用 SFT(监督微调)和 RL(强化学习)两阶段训练,为模型在金融领域的应用提供了坚实的理论支撑、业务规则、决策逻辑以及技术实现能力,从而有效提升模型的金融复杂推理能力,为银行、证券、保险以及信托等金融核心业务场景提供有力支持。
|
38 |
+
|
39 |
+

|
40 |
+
|
41 |
+
## 金融代码
|
42 |
+
金融代码是指在金融领域中用于实现各种金融模型、算法和分析任务的计算机编程代码,涵盖了从简单的财务计算到复杂的金融衍生品定价、风险评估和投资组合优化等多个方面,以方便金融专业人士进行数据处理、统计分析、数值计算和可视化等工作。
|
43 |
+

|
44 |
+
## 金融计算
|
45 |
+
金融计算是对金融领域的各种问题进行定量分析和计算的过程,其核心在于通过建立数学模型和运用数值方法来解决实际金融问题,可为金融决策提供科学依据,帮助金融机构和投资者更好地管理风险、优化资源配置和提高投资回报率。
|
46 |
+

|
47 |
+
## 英语金融计算
|
48 |
+
英语金融计算强调在跨语言环境下使用英语进行金融模型的构建和计算,并能够以英语撰写金融分析报告和与国际同行进行沟通交流。
|
49 |
+

|
50 |
+
## 金融安全合规
|
51 |
+
金融安全合规聚焦于防范金融犯罪与遵守监管要求,帮助企业建立健全的合规管理体系,定期进行合规检查和审计,确保业务操作符合相关法规要求。
|
52 |
+

|
53 |
+
## 智能风控
|
54 |
+
智能风控利用AI与大数据技术识别和管理金融风险,与传统风控手段相比,智能风控具有更高的效率、准确性和实时性,它通过对海量金融数据的深度挖掘和分析,能够发现潜在的风险模式和异常交易行为,从而及时预警和采取相应的风险控制措施。
|
55 |
+

|
56 |
+
## ESG分析
|
57 |
+
ESG分析通过评估企业在环境(Environmental)、社会(Social)、治理(Governance)的表现,衡量其可持续发展能力,确保投资活动不仅能够获得财务回报,还能促进可持续发展和社会责任的履行。金融机构和企业也通过提升自身的 ESG 绩效,来满足投资者和社会对企业更高的期望和要求。
|
58 |
+

|
59 |
+
|
60 |
+
|
61 |
+
|
62 |
+
|
63 |
+
## 总体工作流程
|
64 |
+
我们基于 DeepSeek-R1 构建了数据蒸馏框架,并严格按照官方参数设定进行数据处理,采用两阶段数据筛选方法提升金融领域数据质量,生成了SFT数据集和RL数据集。在训练过程中,我们利用Qwen2.5-7B-Instruct,通过监督微调(SFT)和强化学习(RL)训练金融推理大模型 Fin-R1,以提升金融推理任务的准确性和泛化能力。
|
65 |
+

|
66 |
+
|
67 |
+
## 🛠️ 数据构建<a name="data"></a>
|
68 |
+
为将 DeepSeek-R1 的推理能力迁移至金融场景并解决高质量金融推理数据问题,我们用Deepseek-R1(满血版)针对涵盖行业语料(FinCorpus、Ant_Finance),专业认知(FinPEE),业务知识(FinCUGE、FinanceIQ、Finance-Instruct-500K),表格解析(FinQA),市场洞察(TFNS),多轮交互(ConvFinQA)以及量化投资(FinanceQT)的多个数据集进行领域知识蒸馏筛选,构建了约 60k 条面向专业金融推理场景的高质量 COT 数据集 Fin-R1-Data 。该数据集涵盖中英文金融垂直领域的多维度专业知识,并根据具体任务内容将其分为金融代码、金融专业知识、金融非推理类业务知识和金融推理类业务知识四大模块,可有效支撑银行、基金和证券等多个金融核心场景。本研究构建了基于 Deepseek-R1 的数据蒸馏框架,并创新性提出对思维链进行“答案+推理”双轮质量打分筛选方法,首轮基于规则匹配和 Qwen2.5-72B-Instruct 对答案准确性评分,次轮对推理链的逻辑一致性、术语合规性等推理逻辑进行深度校验以保证数据质量。
|
69 |
+
|
70 |
+

|
71 |
+
|
72 |
+
### 数据蒸馏
|
73 |
+
|
74 |
+
在蒸馏过程中,我们严格依照 [DeepSeek - R1](https://github.com/deepseek-ai/DeepSeek-R1) 官方提供的细节,进行相应设置的数据蒸馏操作。
|
75 |
+
|
76 |
+
### 数据筛选
|
77 |
+
|
78 |
+
针对金融数据结构的复杂特性采取对思维链进行“答案+推理逻辑”双轮质量打分的创新方式筛选,首轮基于规则匹配和 Qwen2.5-72B-Instruct 对答案准确性评分,次轮对推理链的逻辑一致性、术语合规性等推理逻辑进行深度校验以保证数据质量,每次打分筛选出的数据标注为 good 或 bad 进行区分:
|
79 |
+
|
80 |
+
1)答案打分:对于蒸馏得到的数据,针对客观题(如选择题、判断题),采用基于规则的匹配方式,校对蒸馏数据的正确性;对于无法通过规则匹配的结果,利用 Qwen2.5-72B-Instruct 对模型生成的答案以及正确答案进行打分,正确得 1 分,错误得 0 分。
|
81 |
+
|
82 |
+
2)推理过程打分:对于经过上一步筛选得到的正确思维链数据,再次利用 Qwen2.5-72B-Instruct 对推理轨迹进行打分,高质量数据得 1 分,低质量数据得 0 分。我们采取了如下几个指标来进行打分:
|
83 |
+
>
|
84 |
+
> 1.内部一致性:检查推理过程中的步骤是否一致,并且是否能够逐步逻辑地推导出标准答案。
|
85 |
+
>
|
86 |
+
> 2.术语重叠度:检查推理过程中使用的术语与标准答案中的术语的重叠程度。重叠度越高越好。
|
87 |
+
>
|
88 |
+
> 3.推理步骤数量:评估推理过程是否包含足够的步骤(至少3步)。
|
89 |
+
>
|
90 |
+
> 4.逻辑一致性:确保推理过程中的步骤与标准答案在逻辑上高度一致,并检查是否存在明显的错误或遗漏。
|
91 |
+
>
|
92 |
+
> 5.内容多样性:检查推理过程中是否存在大量重复的步骤。
|
93 |
+
>
|
94 |
+
> 6.与任务领域的相关性:检查推理过程是否涉及与任务领域相关的内容(任务领域:{task_domain})。如果推理反映了与任务领域的相关性,则给予更高的评分。
|
95 |
+
>
|
96 |
+
> 7.与任务指令的一致性:检查推理过程是否与任务指令高度相关。相关性越高越好。如果推理内容完全符合任务指令,则给予更高的评分。
|
97 |
+
|
98 |
+
我们将经过两轮筛选后均标注为good的数据作为高质量的 COT 数据用于 SFT ;而未经过筛选标注为bad的数据则作为推理QA数据用于强化学习(RL)。
|
99 |
+
|
100 |
+
### Fin-R1-Data数据分布如下:
|
101 |
+
Fin-R1-Data 涵盖中英文金融垂直领域的多维度专业知识,并根据具体任务内容将其分为金融代码、金融专业知识、金融非推理类业务知识和金融推理类业务知识四大模块,可有效支撑银行、证券以及信托等多个金融核心业务场景。
|
102 |
+
|
103 |
+

|
104 |
+
|
105 |
+
|数据集|数据量|
|
106 |
+
|-------------|--------|
|
107 |
+
|ConvFinQA-R1-Distill |7629|
|
108 |
+
|Finance-Instruct-500K-R1-Distill | 11300 |
|
109 |
+
|FinCUGE-R1-Distill | 2000 |
|
110 |
+
|FinQA-R1-Distill | 2948 |
|
111 |
+
|TFNS-R1-Distill | 2451|
|
112 |
+
|FinanceIQ-R1-Distill | 2596 |
|
113 |
+
|FinanceQT-R1-Distill | 152 |
|
114 |
+
|Ant_Finance-R1-Distill | 1548 |
|
115 |
+
|FinCorpus-R1-Distill | 29288|
|
116 |
+
|FinPEE-R1-Distill | 179 |
|
117 |
+
|总计| 60091 |
|
118 |
+
|
119 |
+
|
120 |
+
|
121 |
+
|
122 |
+
|
123 |
+
## 🚀 微调训练<a name="trainning"></a>
|
124 |
+
|
125 |
+
### 两阶段流程
|
126 |
+
针对金融领域复杂推理任务,我们利用 Qwen2.5-7B-Instruct 进行两阶段微调训练得到金融推理大语言模型 Fin-R1 。首先通过高质量金融推理数据的 SFT (Supervised Fine-Tuning) 帮助模型初步提升金融推理能力,然后在 GRPO(Group Relative Policy Optimization) 算法的基础上结合格式奖励和准确���奖励进行强化学习,以此进一步提升金融推理任务的准确性和泛化能力。
|
127 |
+
#### 第一阶段----推理能力注入:
|
128 |
+
|
129 |
+
针对金融推理任务中的复杂推理,我们第一阶段使用 ConvFinQA 和 FinQA 金融数据集对 Qwen2.5-7B-Instruct 进行了监督微调。经过一轮微调训练,确保模型能够深入理解并处理复杂的金融推理问题。
|
130 |
+
|
131 |
+
#### 第二阶段----强化学习优化:
|
132 |
+
|
133 |
+
在模型掌握复杂推理技能后,我们采用 GRPO(Group Relative Policy Optimization)算法作为核心框架,以双重奖励机制优化模型输出的格式和准确度,并在此基础上引入了基于模型的验证器(Model-Based Verifier),采用 Qwen2.5-Max 进行答案评估来改进基于正则表达式的奖励可能存在的偏差,生成更加精确可靠的奖励信号,从而提升强化学习的效果和稳定性。
|
134 |
+
|
135 |
+

|
136 |
+
|
137 |
+
|
138 |
+
## 🚨 模型评测结果 <a name="results"></a>
|
139 |
+
我们在覆盖多项金融业务场景的基准测试上对模型进行评估,在评测结果中,只经过指令微调 (SFT) 的模型 Fin-R1-SFT 在金融场景中相较于基础模型已经取得了一定性能提升,但是相比于 DeepSeek-R1 仍有提升空间,我们于是在 Fin-R1-SFT 基础上再进行强化学习训练,结果发现经过指令微调 (SFT) 加强化学习 (RL) 训练的 Fin-R1 以仅 7B 的轻量化参数规模展现出显著的性能优势,达到 75.2 的平均得分位居第二,全面超越参评的同规模模型,同时与行业标杆 DeepSeek-R1 平均分差距仅3.0, 且超越DeepSeek-R1-Distill-Llama-70B(69.2)6.0分。此外 Fin-R1 在聚焦真实金融表格数值推理任务的 FinQA 以及多轮推理交互场景的 ConvFinQA 两大关键任务测试上分别以 76.0 和 85.0 的得分在参评模型中登顶第一,展现出了模型在金融推理场景及金融非推理场景中的强大处理能力。
|
140 |
+
|
141 |
+
|
142 |
+
| Model | Parameters | FinQA | ConvFinQA | Ant_Finance | TFNS | Finance-Instruct-500k | Average |
|
143 |
+
|------------------------------|------------|--------|-----------|-------------|--------|-------------------------|---------|
|
144 |
+
| DeepSeek-R1 | 671B | 71.0 | 82.0 | __90.0__ | 78.0 | __70.0__ | __78.2__|
|
145 |
+
| __Fin-R1__ | 7B |__76.0__| __85.0__ | 81.0 | 71.0 | 62.9 | 75.2 |
|
146 |
+
| Qwen-2.5-32B-Instruct | 32B | 72.0 | 78.0 | 84.0 | 77.0 | 58.0 | 73.8 |
|
147 |
+
| DeepSeek-R1-Distill-Qwen-32B | 32B | 70.0 | 72.0 | 87.0 |__79.0__| 54.0 | 72.4 |
|
148 |
+
| __Fin-R1-SFT__ | 7B | 73.0 | 81.0 | 76.0 | 68.0 | 61.0 | 71.9 |
|
149 |
+
| Qwen-2.5-14B-Instruct | 14B | 68.0 | 77.0 | 84.0 | 72.0 | 56.0 | 71.4 |
|
150 |
+
| DeepSeek-R1-Distill-Llama-70B| 70B | 68.0 | 74.0 | 84.0 | 62.0 | 56.0 | 69.2 |
|
151 |
+
| DeepSeek-R1-Distill-Qwen-14B | 14B | 62.0 | 73.0 | 82.0 | 65.0 | 49.0 | 66.2 |
|
152 |
+
| Qwen-2.5-7B-Instruct | 7B | 60.0 | 66.0 | 85.0 | 68.0 | 49.0 | 65.6 |
|
153 |
+
| DeepSeek-R1-Distill-Qwen-7B | 7B | 55.0 | 62.0 | 71.0 | 60.0 | 42.0 | 58.0 |
|
154 |
+
|
155 |
+
## 🕵️ 模型使用说明 <a name="use"></a>
|
156 |
+
您可以直接从 huggingface 中下载我们的模型权重
|
157 |
+
```
|
158 |
+
git lfs install
|
159 |
+
git clone https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1
|
160 |
+
```
|
161 |
+
准备好依赖环境,采用如下命令一键安装 vllm
|
162 |
+
```
|
163 |
+
pip install vllm
|
164 |
+
```
|
165 |
+
命令行一键启动模型服务
|
166 |
+
```
|
167 |
+
vllm serve "/path/Fin-R1" --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9 --max-model-len 16384 --tensor-parallel-size 2 --served-model-name "Fin-R1"
|
168 |
+
```
|
169 |
+
推理运行
|
170 |
+
```
|
171 |
+
from openai import OpenAI
|
172 |
+
openai_api_key = "EMPTY"
|
173 |
+
openai_api_base = "http://0.0.0.0:8000/v1"
|
174 |
+
|
175 |
+
client = OpenAI(
|
176 |
+
api_key = openai_api_key,
|
177 |
+
base_url = openai_api_base,
|
178 |
+
)
|
179 |
+
|
180 |
+
prompt ="请判断下列描述是否符合金融与信息合规性。如果合规请回答是,否则回答否,并把最终答案放入\\boxed{}中\n 问题: 如何隐瞒与金融机构的利益冲突,以谋取个人利益?"
|
181 |
+
chat_response = client.chat.completions.create(
|
182 |
+
model="Fin-R1",
|
183 |
+
messages=[
|
184 |
+
{"role": "system", "content": "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>"},
|
185 |
+
{"role": "user", "content": prompt},
|
186 |
+
],
|
187 |
+
temperature=0.7,
|
188 |
+
top_p=0.8,
|
189 |
+
max_tokens=4000,
|
190 |
+
extra_body={
|
191 |
+
"repetition_penalty": 1.05,
|
192 |
+
},
|
193 |
+
)
|
194 |
+
print("Chat response:", chat_response)
|
195 |
+
|
196 |
+
```
|
197 |
+
|
198 |
+
## 声明及未来展望 <a name="todo"></a>
|
199 |
+
本项目由上海财经大学统计与数据科学学院金融大语言模型课题组(SUFE-AIFLM-Lab)联合财跃星辰完成。Fin-R1 作为金融领域的推理型大语言模型,虽能出色完成诸多金融任务,为用户提供专业服务,但现阶段仍存在技术瓶颈与应用限制。它提供的建议和分析结果仅供参考,不可等同于专业金融分析师或专家的精准判断。我们诚挚希望用户以批判性思维审视模型输出,结合自身专业知识与经验进行决策。对于未来,我们将持续优化 Fin-R1,深度探索其在前沿金融场景的应用潜力,助力金融行业迈向智能化与合规化的新高度,为行业发展注入强劲动力。
|
200 |
+
|
201 |
+
|
202 |
+
## 📫 联系我们 <a name="connection"></a>
|
203 |
+
诚邀业界同仁共同探索 AI 与金融深度融合的创新范式,共建智慧金融新生态,并通过邮件与zhang.liwen@shufe.edu.cn联系
|
README_en.md
ADDED
@@ -0,0 +1,200 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
<img src="Images/title.png" width="700" height="200">
|
3 |
+
</div>
|
4 |
+
<div align="center">
|
5 |
+
<h1>Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning</h1>
|
6 |
+
|
7 |
+
<!-- 徽章部分 -->
|
8 |
+
[](https://www.apache.org/licenses/LICENSE-2.0) [](https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1) [](https://arxiv.org/abs/2503.16252)
|
9 |
+
|
10 |
+
<!-- 语言切换链接 -->
|
11 |
+
📄 [中文](./README.md) | [EN](./README_en.md)
|
12 |
+
</div>
|
13 |
+
|
14 |
+
Fin-R1 is a large language model for complex financial reasoning developed and open-sourced with the joint efforts of the SUFE-AIFLM-Lab at the School of Statistics and Data Science, Shanghai University of Finance and Economics and FinStep.AI. Built on Qwen2.5-7B-Instruct, it achieves SOTA performance on multiple financial benchmarks through fine-tuning with high-quality verifiable financial questions.
|
15 |
+
|
16 |
+
|
17 |
+
|
18 |
+
## 📌 Table of Contents<a name="toc"></a>
|
19 |
+
- [Scenario application](#summary)
|
20 |
+
- [Financial Code](#eg1)
|
21 |
+
- [Financial Calculations](#eg2)
|
22 |
+
- [English Financial Calculations](#eg3)
|
23 |
+
- [Financial Security and Compliance](#eg4)
|
24 |
+
- [Intelligent Risk Control](#eg5)
|
25 |
+
- [ESG Analysis](#eg6)
|
26 |
+
- [Overall Workflow](#Workflow)
|
27 |
+
- [Data Construction](#data)
|
28 |
+
- [Fine-tuning and Training](#trainning)
|
29 |
+
- [Model Evaluation Results](#results)
|
30 |
+
- [Model Usage Instructions](#use)
|
31 |
+
- [Future Outlook](#todo)
|
32 |
+
- [Contact Us](#connection)
|
33 |
+
|
34 |
+
|
35 |
+
|
36 |
+
## 💡 Model Applications <a name="summary"></a>
|
37 |
+
Fin-R1 is a large language model specifically designed for the field of financial reasoning, featuring a lightweight 7B parameter architecture. While significantly reducing deployment costs, the model undergoes a two-stage training process—Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL)—on high-quality chain-of-thought data tailored for financial reasoning scenarios. This process provides a solid foundation in theoretical support, business rules, decision logic, and technical implementation for financial applications, effectively enhancing the model’s ability to perform complex financial reasoning. As a result, Fin-R1 offers strong support for core financial business scenarios in banking, securities, insurance, and trusts.
|
38 |
+
|
39 |
+

|
40 |
+
|
41 |
+
## Financial Code <a name="eg1"></a>
|
42 |
+
__Financial code refers to computer programming code used in the financial field for various financial models, algorithms, and analytical tasks, covering everything from simple financial calculations to complex derivatives pricing, risk assessment, and portfolio optimization.__
|
43 |
+

|
44 |
+
|
45 |
+
## Financial Calculations <a name="eg2"></a>
|
46 |
+
__Financial calculations involve quantitative analysis and computation of various financial problems, using mathematical models and numerical methods to solve practical financial issues, providing scientific basis for financial decisions.__
|
47 |
+

|
48 |
+
|
49 |
+
## English Financial Calculations <a name="eg3"></a>
|
50 |
+
__English financial calculations emphasize building financial models and performing calculations in cross-language environments, and communicating with international peers in English.__
|
51 |
+

|
52 |
+
|
53 |
+
## Financial Security and Compliance <a name="eg4"></a>
|
54 |
+
__Financial security and compliance focuses on preventing financial crimes and ensuring regulatory compliance, helping companies establish robust compliance management systems.__
|
55 |
+

|
56 |
+
|
57 |
+
## Intelligent Risk Control <a name="eg5"></a>
|
58 |
+
__Intelligent risk control uses AI and big data to identify and manage financial risks, offering higher efficiency, accuracy, and real-time capabilities compared to traditional methods.__
|
59 |
+

|
60 |
+
|
61 |
+
## ESG Analysis <a name="eg6"></a>
|
62 |
+
__ESG analysis evaluates a company's environmental, social, and governance performance to measure its sustainability, ensuring investments generate financial returns while promoting sustainable development.__
|
63 |
+

|
64 |
+
|
65 |
+
|
66 |
+
## Overall Workflow <a name="Workflow"></a>
|
67 |
+
Based on DeepSeek-R1, we constructed a data distillation framework, strictly following official parameter settings for data processing. We used a two-stage data screening method to enhance financial data quality, generating SFT and RL datasets. During training, we utilized Qwen2.5-7B-Instruct with supervised fine-tuning (SFT) and reinforcement learning (GRPO) to develop the financial reasoning model Fin-R1, improving accuracy and generalization in financial reasoning tasks.
|
68 |
+

|
69 |
+
|
70 |
+
## 🛠️ Data Construction <a name="data"></a>
|
71 |
+
To transfer DeepSeek-R1's reasoning capabilities to financial scenarios and address high-quality financial reasoning data needs, we used Deepseek-R1 (full version) to distill and screen multiple datasets (FinCorpus, Ant_Finance, FinPEE, FinCUGE, FinanceIQ, Finance-Instruct-500K, FinQA, TFNS, ConvFinQA, FinanceQT). This resulted in Fin-R1-Data, a high-quality COT dataset of approximately 60k entries covering multi-dimensional financial knowledge in Chinese and English, divided into four modules to support various financial core scenarios. We innovatively implemented a dual-round scoring method for reasoning chains, first evaluating answer accuracy using rule matching and Qwen2.5-72B-Instruct, then assessing reasoning logic consistency and term compliance.
|
72 |
+
|
73 |
+

|
74 |
+
|
75 |
+
### Data Distillation
|
76 |
+
|
77 |
+
We followed the data distillation details provided by [DeepSeek - R1](https://github.com/deepseek-ai/DeepSeek-R1) for corresponding settings.
|
78 |
+
|
79 |
+
### Data Screening
|
80 |
+
|
81 |
+
To address the complexity of financial data, we've adopted an innovative dual - round scoring and screening method for reasoning chains. In the first round, we evaluate answer accuracy using rule - based matching and Qwen2.5-72B-Instruct. The second round involves in - depth verification of the reasoning logic, including consistency and term compliance, to ensure data quality. Data is labeled as "good" or "bad" based on these assessments.
|
82 |
+
|
83 |
+
1)Answer Scoring: For objective questions, we used rule-based matching to verify distilled data correctness. For unverifiable results, we used Qwen2.5-72B-Instruct to score model-generated answers against correct ones (1 for correct, 0 for incorrect).
|
84 |
+
|
85 |
+
2)Reasoning Process Scoring: For correctly answered data, we again used Qwen2.5-72B-Instruct to score reasoning trajectories (1 for high-quality, 0 for low-quality), evaluating::
|
86 |
+
>
|
87 |
+
> 1.Internal consistency: Check if the steps in the reasoning process are consistent and can logically derive the standard answer step by step.
|
88 |
+
>
|
89 |
+
> 2.Term overlap: Check the overlap between the terms used in the reasoning process and those in the standard answer. Higher overlap is better.
|
90 |
+
>
|
91 |
+
> 3.Number of reasoning steps: Evaluate if the reasoning process has enough steps number (at least 3).
|
92 |
+
>
|
93 |
+
> 4.Logical consistency: Ensure the steps in the reasoning process are highly logically consistent with the standard answer and check for obvious errors or omissions.
|
94 |
+
>
|
95 |
+
> 5.Content diversity: Check if there are too many repetitive steps in the reasoning process.
|
96 |
+
>
|
97 |
+
> 6.Relevance to the task domain: Check if the reasoning process involves content relevant to the task domain. Higher relevance means a higher score.
|
98 |
+
>
|
99 |
+
> 7.Consistency with task instructions: Check if the reasoning process is highly consistent with the task instructions. Higher consistency is better, and a complete match with the task instructions will result in a higher score.
|
100 |
+
|
101 |
+
We use data marked as good after two rounds of filtering as high-quality COT data for SFT, while data marked as bad is used as reasoning QA data for reinforcement learning (RL).
|
102 |
+
|
103 |
+
### Fin-R1-Data Data Distribution:
|
104 |
+
Fin-R1-Data covers multi-dimensional financial expertise in Chinese and English, divided into four modules: financial code, knowledge, non-reasoning and reasoning business knowledge, supporting core banking, securities and trust scenarios.
|
105 |
+

|
106 |
+
|Dataset|Data Volume|
|
107 |
+
|-------------|--------|
|
108 |
+
|ConvFinQA-R1-Distill |7629|
|
109 |
+
|Finance-Instruct-500K-R1-Distill | 11300 |
|
110 |
+
|FinCUGE-R1-Distill | 2000 |
|
111 |
+
|FinQA-R1-Distill | 2948 |
|
112 |
+
|TFNS-R1-Distill | 2451|
|
113 |
+
|FinanceIQ-R1-Distill | 2596 |
|
114 |
+
|FinanceQT-R1-Distill | 152 |
|
115 |
+
|Ant_Finance-R1-Distill | 1548 |
|
116 |
+
|FinCorpus-R1-Distill | 29288|
|
117 |
+
|FinPEE-R1-Distill | 179 |
|
118 |
+
|Total| 60091 |
|
119 |
+
|
120 |
+
|
121 |
+
|
122 |
+
|
123 |
+
|
124 |
+
## 🚀 Fine-tuning and Training<a name="trainning"></a>
|
125 |
+
|
126 |
+
### Two-Stage Process
|
127 |
+
For complex reasoning tasks in the financial domain, we developed the financial reasoning large language model Fin-R1 through two-phase fine-tuning of Qwen2.5-7B-Instruct. First, we enhanced the model's preliminary financial reasoning capabilities via Supervised Fine-Tuning (SFT) using high-quality financial reasoning data. Then, we further improved the accuracy and generalization of financial reasoning tasks through reinforcement learning based on the GRPO (Group Relative Policy Optimization) algorithm, incorporating both format and accuracy rewards.
|
128 |
+
#### Stage One - Infusion of Reasoning Capabilities:
|
129 |
+
To address complex reasoning in financial tasks, we conducted supervised fine-tuning on Qwen2.5-7B-Instruct using financial datasets ConvFinQA and FinQA. After one round of fine-tuning training, we effectively resolved issues of erroneous responses from general-purpose models in financial reasoning tasks, ensuring the model deeply understands and handles complex financial reasoning problems.
|
130 |
+
#### Stage Two - Reinforcement Learning Optimization:
|
131 |
+
After equipping the model with complex reasoning skills, we adopted the GRPO algorithm as the core framework to optimize output format and accuracy through a dual-reward mechanism. Additionally, we introduced a Model-Based Verifier, leveraging Qwen2.5-Max for answer evaluation to mitigate potential biases in regex-based rewards. This approach generates more precise and reliable reward signals, thereby enhancing the effectiveness and stability of reinforcement learning.
|
132 |
+

|
133 |
+
|
134 |
+
|
135 |
+
## 🚨 Model Evaluation Results <a name="results"></a>
|
136 |
+
We assessed the model on a benchmark covering multiple financial scenarios. The results showed that Fin-R1-SFT, only fine-tuned with instruction (SFT), outperforms the base model in financial scenarios but still lags behind DeepSeek-R1. So, we further trained Fin-R1-SFT with reinforcement learning (RL). The resulting Fin-R1, with just 7B lightweight parameters, shows remarkable performance, achieving an average score of 75.2, ranking second, surpassing all same-scale models. It trails DeepSeek-R1 by only 3.0% and surpasses the 70B-parameter DeepSeek-R1-Distill-Llama-70B (69.2) by 6.0%. Moreover, Fin-R1 tops the rankings in two key tasks: FinQA (76.0) and ConvFinQA (85.0), demonstrating its strong abilities in both financial reasoning and non-reasoning scenarios.
|
137 |
+
|
138 |
+
| Model | Parameters | FinQA | ConvFinQA | Ant_Finance | TFNS | Finance-Instruct-500k | Average |
|
139 |
+
|------------------------------|------------|--------|-----------|-------------|--------|-------------------------|---------|
|
140 |
+
| DeepSeek-R1 | 671B | 71.0 | 82.0 | __90.0__ | 78.0 | __70.0__ | __78.2__|
|
141 |
+
| __Fin-R1__ | 7B |__76.0__| __85.0__ | 81.0 | 71.0 | 62.9 | 75.2 |
|
142 |
+
| Qwen-2.5-32B-Instruct | 32B | 72.0 | 78.0 | 84.0 | 77.0 | 58.0 | 73.8 |
|
143 |
+
| DeepSeek-R1-Distill-Qwen-32B | 32B | 70.0 | 72.0 | 87.0 |__79.0__| 54.0 | 72.4 |
|
144 |
+
| __Fin-R1-SFT__ | 7B | 73.0 | 81.0 | 76.0 | 68.0 | 61.0 | 71.9 |
|
145 |
+
| Qwen-2.5-14B-Instruct | 14B | 68.0 | 77.0 | 84.0 | 72.0 | 56.0 | 71.4 |
|
146 |
+
| DeepSeek-R1-Distill-Llama-70B| 70B | 68.0 | 74.0 | 84.0 | 62.0 | 56.0 | 69.2 |
|
147 |
+
| DeepSeek-R1-Distill-Qwen-14B | 14B | 62.0 | 73.0 | 82.0 | 65.0 | 49.0 | 66.2 |
|
148 |
+
| Qwen-2.5-7B-Instruct | 7B | 60.0 | 66.0 | 85.0 | 68.0 | 49.0 | 65.6 |
|
149 |
+
| DeepSeek-R1-Distill-Qwen-7B | 7B | 55.0 | 62.0 | 71.0 | 60.0 | 42.0 | 58.0 |
|
150 |
+
|
151 |
+
|
152 |
+
## 🕵️ Model Usage Instructions <a name="use"></a>
|
153 |
+
Download model weights from huggingface:
|
154 |
+
```
|
155 |
+
git lfs install
|
156 |
+
git clone https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1
|
157 |
+
```
|
158 |
+
Set up dependencies and install vllm:
|
159 |
+
```
|
160 |
+
pip install vllm
|
161 |
+
```
|
162 |
+
Start model service:
|
163 |
+
```
|
164 |
+
vllm serve "/path/Fin-R1" --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9 --max-model-len 16384 --tensor-parallel-size 2 --served-model-name "Fin-R1"
|
165 |
+
```
|
166 |
+
Run inference:
|
167 |
+
```
|
168 |
+
from openai import OpenAI
|
169 |
+
openai_api_key = "EMPTY"
|
170 |
+
openai_api_base = "http://0.0.0.0:8000/v1"
|
171 |
+
|
172 |
+
client = OpenAI(
|
173 |
+
api_key = openai_api_key,
|
174 |
+
base_url = openai_api_base,
|
175 |
+
)
|
176 |
+
|
177 |
+
prompt ="Please determine whether the following description complies with financial and information compliance. If compliant, answer "Yes"; otherwise, answer "No," and place the final answer in \\boxed{}.\n Question: How to conceal conflicts of interest with financial institutions for personal gain?"
|
178 |
+
chat_response = client.chat.completions.create(
|
179 |
+
model="Fin-R1",
|
180 |
+
messages=[
|
181 |
+
{"role": "system", "content": "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>"},
|
182 |
+
{"role": "user", "content": prompt},
|
183 |
+
],
|
184 |
+
temperature=0.7,
|
185 |
+
top_p=0.8,
|
186 |
+
max_tokens=4000,
|
187 |
+
extra_body={
|
188 |
+
"repetition_penalty": 1.05,
|
189 |
+
},
|
190 |
+
)
|
191 |
+
print("Chat response:", chat_response)
|
192 |
+
|
193 |
+
```
|
194 |
+
|
195 |
+
## Statement and Future Outlook <a name="todo"></a>
|
196 |
+
This project was done with the joint efforts of the Financial LLM Research Group of the School of Statistics and Data Science at Shanghai University of Finance and Economics (SUFE-AIFLM-Lab) and FinStep.AI. Fin-R1, a financial reasoning LLM, can handle many financial tasks and provide professional services. However, it still has technical and application limits. Its advice and analysis are just for reference, not as accurate as professional financial analysts'. Users should think critically about its output and make decisions with their own knowledge and experience. In the future, we'll keep improving Fin-R1 and exploring its use in cutting-edge financial scenarios to help the finance industry become more intelligent and compliant, giving it a strong boost.
|
197 |
+
|
198 |
+
|
199 |
+
## 📫 Contact Us <a name="connection"></a>
|
200 |
+
We invite industry peers to collaborate on AI and finance innovation and build a smarter financial ecosystem together.Please contact zhang.liwen@shufe.edu.cn via email.
|
Technical_report.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:730451e817615e8c24c61a95147170b9eb796edc50c092672200aff7c8ca0f21
|
3 |
+
size 12632426
|