CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Large language models (LLMs) have obtained promising results in mathematical reasoning, a foundational human intelligence skill. Most previous studies focus on improving or measuring the performance of LLMs via textual math datasets (e.g., MATH, GSM8K). In this paper, we release a Chinese multimodal math (CMM-Math) dataset, including benchmark and training parts, to evaluate and enhance the mathematical reasoning of LMMs. CMM-Math contains over 28,000 high-quality samples, featuring a variety of problem types (e.g., choice, fill-in-the-blank, analysis) with detailed solutions across 12 grade levels from elementary to high school in China. The problem may contain multiple images, and the visual context may be present in the questions or opinions, which makes this dataset more challenging. Our comprehensive analysis reveals that state-of-the-art LMMs on the CMM-Math face challenges, emphasizing the necessity for further improvements in LMM development. We also propose a Multimodal Mathematical LMM (Math-LMM) to handle the problems with mixed input of multiple images and text segments. The Math-LMM is trained using three stages: foundational pre-training, foundational fine-tuning, and mathematical fine-tuning. The extensive experiments indicate that our model effectively improves math reasoning performance by comparing it with the SOTA LMMs over three multimodal mathematical datasets. We release the datasets on GitHub (https://github.com/ECNU-ICALK/EduChat-Math) and Huggingface (https://huggingface.co/datasets/ecnu-icalk/cmm-math).

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages12585-12591
Number of pages7
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • benchmark
  • chinese
  • large multimodal models
  • mathematical reasoning

Fingerprint

Dive into the research topics of 'CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models'. Together they form a unique fingerprint.

Cite this