Enhanced video clustering using multiple riemannian manifold-valued descriptors and audio-visual information[Formula presented]

  • Wenbo Hu
  • , Hongjian Zhan*
  • , Yinghong Tian
  • , Yujie Xiong
  • , Yue Lu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Videos inherently blend multiple modalities in real-world scenarios, primarily visual and auditory cues. When synergized, these cues foster enhanced data representations. Standard clustering techniques, primarily designed for managing vectorial data in Euclidean spaces, struggle to handle multidimensional data with nonlinear manifold structures, such as video or image sets. While recent subspace clustering methods using Riemannian manifold representation tackle this issue, they often sideline auditory information, overlooking the potential harmony between visual and auditory modalities. This paper presents an innovative approach that crafts multiple Riemannian manifold-valued descriptors to bridge this gap, encapsulating multimodal video information in a unified structure. We architect a single-modality Riemannian subspace clustering for individual modal data and extend it to a multi-modality framework, leveraging the interplay of audio-visual data. Detailed optimization and convergence analysis are also provided. The proposed approach significantly outperforms the existing state-of-the-art methods, improving accuracy by 4%, 1%, and 2% on UCF-101, UCF-sport, and AVE datasets, respectively.

Original languageEnglish
Article number123099
JournalExpert Systems with Applications
Volume246
DOIs
StatePublished - 15 Jul 2024

Keywords

  • Audio-visual
  • Riemannian manifolds
  • Subspace clustering

Fingerprint

Dive into the research topics of 'Enhanced video clustering using multiple riemannian manifold-valued descriptors and audio-visual information[Formula presented]'. Together they form a unique fingerprint.

Cite this