MM-Path: Multi-modal, Multi-granularity Path Representation Learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Developing effective path representations has become increasingly essential across various fields within intelligent transportation. Although pre-trained path representation learning models have shown improved performance, they predominantly focus on the topological structures from single modality data, i.e., road networks, overlooking the geometric and contextual features associated with path-related images, e.g., remote sensing images. Similar to human understanding, integrating information from multiple modalities can provide a more comprehensive view, enhancing both representation accuracy and generalization. However, variations in information granularity impede the semantic alignment of road network-based paths (road paths) and image-based paths (image paths), while the heterogeneity of multi-modal data poses substantial challenges for effective fusion and utilization. In this paper, we propose a novel Multi-modal, Multi-granularity Path Representation Learning Framework (MM-Path), which can learn a generic path representation by integrating modalities from both road paths and image paths. To enhance the alignment of multi-modal data, we develop a multi-granularity alignment strategy that systematically associates nodes, road sub-paths, and road paths with their corresponding image patches, ensuring the synchronization of both detailed local information and broader global contexts. To address the heterogeneity of multi-modal data effectively, we introduce a graph-based cross-modal residual fusion component designed to comprehensively fuse information across different modalities and granularities. Finally, we conduct extensive experiments on two large-scale real-world datasets under two downstream tasks, validating the effectiveness of the proposed MM-Path.

Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1703-1714
Number of pages12
ISBN (Electronic)9798400712456
DOIs
StatePublished - 20 Jul 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto, Canada
Duration: 3 Aug 20257 Aug 2025

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume1
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25

Keywords

  • multi-modal learning
  • path representation learning
  • self-supervised learning

Fingerprint

Dive into the research topics of 'MM-Path: Multi-modal, Multi-granularity Path Representation Learning'. Together they form a unique fingerprint.

Cite this