TY - GEN
T1 - FinDABench
T2 - 31st International Conference on Computational Linguistics, COLING 2025
AU - Liu, Shu
AU - Zhao, Shangqing
AU - Jia, Chenghao
AU - Zhuang, Xinlin
AU - Long, Zhao Guang
AU - Zhou, Jie
AU - Zhou, Aimin
AU - Lan, Man
AU - Chong, Yang
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. However, their proficiency and reliability in the specialized domain of financial data analysis, particularly focusing on data-driven thinking, remain uncertain. To bridge this gap, we introduce FinDABench, a comprehensive benchmark designed to evaluate the financial data analysis capabilities of LLMs within this context. The benchmark comprises 15,200 training instances and 8,900 test instances, all meticulously crafted by human experts. FinDABench assesses LLMs across three dimensions: 1) Core Ability, evaluating the models' ability to perform financial indicator calculation and corporate sentiment risk assessment; 2) Analytical Ability, determining the models' ability to quickly comprehend textual information and analyze abnormal financial reports; and 3) Technical Ability, examining the models' use of technical knowledge to address real-world data analysis challenges involving analysis generation and charts visualization from multiple perspectives. We will release FinDABench, and the evaluation scripts at https://github.com/cubenlp/FinDABench. FinDABench aims to provide a measure for in-depth analysis of LLM abilities and foster the advancement of LLMs in the field of financial data analysis.
AB - Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. However, their proficiency and reliability in the specialized domain of financial data analysis, particularly focusing on data-driven thinking, remain uncertain. To bridge this gap, we introduce FinDABench, a comprehensive benchmark designed to evaluate the financial data analysis capabilities of LLMs within this context. The benchmark comprises 15,200 training instances and 8,900 test instances, all meticulously crafted by human experts. FinDABench assesses LLMs across three dimensions: 1) Core Ability, evaluating the models' ability to perform financial indicator calculation and corporate sentiment risk assessment; 2) Analytical Ability, determining the models' ability to quickly comprehend textual information and analyze abnormal financial reports; and 3) Technical Ability, examining the models' use of technical knowledge to address real-world data analysis challenges involving analysis generation and charts visualization from multiple perspectives. We will release FinDABench, and the evaluation scripts at https://github.com/cubenlp/FinDABench. FinDABench aims to provide a measure for in-depth analysis of LLM abilities and foster the advancement of LLMs in the field of financial data analysis.
UR - https://www.scopus.com/pages/publications/85218492337
M3 - 会议稿件
AN - SCOPUS:85218492337
T3 - Proceedings - International Conference on Computational Linguistics, COLING
SP - 710
EP - 725
BT - Main Conference
A2 - Rambow, Owen
A2 - Wanner, Leo
A2 - Apidianaki, Marianna
A2 - Al-Khalifa, Hend
A2 - Di Eugenio, Barbara
A2 - Schockaert, Steven
PB - Association for Computational Linguistics (ACL)
Y2 - 19 January 2025 through 24 January 2025
ER -