CSMD: Curated Multimodal Dataset for Chinese Stock Analysis

  • Yu Liu
  • , Zhuoying Li
  • , Ruifeng Yang
  • , Fengran Mo
  • , Cen Chen*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The stock market is a complex and dynamic system, where it is non-trivial for researchers and practitioners to uncover underlying patterns and forecast stock movements. The existing studies for stock market analysis rely on leveraging various types of information to extract useful factors, which are highly conditional on the quality of the data used. However, the currently available resources are mainly based on the U.S. stock market in English, which is inapplicable to adapt to other countries. To address these issues, we propose CSMD, a multimodal dataset curated specifically for analyzing the Chinese stock market with meticulous processing for validated quality. In addition, we develop a lightweight and user-friendly framework LightQuant for researchers and practitioners with expertise in financial domains. Experimental results on top of our datasets and framework with various backbone models demonstrate their effectiveness compared with using existing datasets. The datasets and code are publicly available at the link: https://github.com/ECNU-CILAB/LightQuant.

Original languageEnglish
Title of host publicationCIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery, Inc
Pages6471-6475
Number of pages5
ISBN (Electronic)9798400720406
DOIs
StatePublished - 10 Nov 2025
Event34th ACM International Conference on Information and Knowledge Management, CIKM 2025 - Seoul, Korea, Republic of
Duration: 10 Nov 202514 Nov 2025

Publication series

NameCIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management

Conference

Conference34th ACM International Conference on Information and Knowledge Management, CIKM 2025
Country/TerritoryKorea, Republic of
CitySeoul
Period10/11/2514/11/25

Keywords

  • chinese datasets
  • multimodal datasets
  • stock movement prediction

Fingerprint

Dive into the research topics of 'CSMD: Curated Multimodal Dataset for Chinese Stock Analysis'. Together they form a unique fingerprint.

Cite this