跳到主要导航 跳到搜索 跳到主要内容

Enhanced video clustering using multiple riemannian manifold-valued descriptors and audio-visual information[Formula presented]

  • Wenbo Hu
  • , Hongjian Zhan*
  • , Yinghong Tian
  • , Yujie Xiong
  • , Yue Lu
  • *此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Videos inherently blend multiple modalities in real-world scenarios, primarily visual and auditory cues. When synergized, these cues foster enhanced data representations. Standard clustering techniques, primarily designed for managing vectorial data in Euclidean spaces, struggle to handle multidimensional data with nonlinear manifold structures, such as video or image sets. While recent subspace clustering methods using Riemannian manifold representation tackle this issue, they often sideline auditory information, overlooking the potential harmony between visual and auditory modalities. This paper presents an innovative approach that crafts multiple Riemannian manifold-valued descriptors to bridge this gap, encapsulating multimodal video information in a unified structure. We architect a single-modality Riemannian subspace clustering for individual modal data and extend it to a multi-modality framework, leveraging the interplay of audio-visual data. Detailed optimization and convergence analysis are also provided. The proposed approach significantly outperforms the existing state-of-the-art methods, improving accuracy by 4%, 1%, and 2% on UCF-101, UCF-sport, and AVE datasets, respectively.

源语言英语
文章编号123099
期刊Expert Systems with Applications
246
DOI
出版状态已出版 - 15 7月 2024

指纹

探究 'Enhanced video clustering using multiple riemannian manifold-valued descriptors and audio-visual information[Formula presented]' 的科研主题。它们共同构成独一无二的指纹。

引用此