跳到主要导航 跳到搜索 跳到主要内容

Unsupervised temporal action segmentation with sample discrimination training and alignment-based boundary refinement

  • Feng Huang
  • , Xiao Diao Chen*
  • , Hongyu Chen
  • , Haichuan Song
  • *此作品的通讯作者
  • Hangzhou Dianzi University
  • Zhejiang Province Taizhou Technician College

科研成果: 期刊稿件文章同行评审

摘要

Unsupervised temporal action segmentation (UTAS) addresses the task of partitioning untrimmed videos into coherent action segments without manual annotations. While boundary-detection-based approaches have demonstrated superior performance, they exhibit two critical limitations. First, these methods often uniformly treat all frames during training, resulting in over-segmentation and suboptimal performance. Second, they primarily rely on intra-video features while neglecting potentially valuable inter-video correlations within the dataset. To address these challenges, we present a comprehensive UTAS framework with three key innovations: (1) A discriminative training mechanism that differentiates between boundary/non-boundary frames in the temporal domain and motion/background pixels in the spatial domain, employing weighted training strategies alongside multiple temporal-scale modeling. (2) A self-validation mechanism for cross-verifying predictions across different input sequences. (3) A boundary refinement approach based on video alignment, which constructs reference video sets according to feature distributions and establishes inter-video correspondences to improve boundary localization. Extensive evaluations on three benchmark datasets, i.e., the Breakfast, the 50Salads, and the YouTube Instructions, demonstrate that our approach achieves state-of-the-art performance, with quantitative results showing significant improvements over existing methods.

源语言英语
文章编号131636
期刊Neurocomputing
658
DOI
出版状态已出版 - 28 12月 2025

指纹

探究 'Unsupervised temporal action segmentation with sample discrimination training and alignment-based boundary refinement' 的科研主题。它们共同构成独一无二的指纹。

引用此