跳到主要导航 跳到搜索 跳到主要内容

Self-supervised Compressed Video Action Recognition via Temporal-Consistent Sampling

  • Shanghai Jiao Tong University
  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Compressed video action recognition targets at classifying action class in compressed video, instead of decoded/standard video. It benefits from fast training and inference by reducing the utilization of redundant information. However, off-the-shelf methods still rely on heavy-cost labels for training. In this paper, we propose self-supervised compressed video action recognition method via Momentum contrast (MoCo) and temporal-consistent sampling. We leverage temporal-consistent sampling into MoCo to improve the ability of feature presentation on each input modality of compressed video. Modality-oriented fine-tuning is introduced to applying into the downstream compressed video action recognition. Extensive experiments demonstrate the effectiveness of our method on different datasets with different backbones. Compared to SOTA self-supervised learning methods for decoded videos on HMDB51 dataset, our method achieves the highest accuracy of 57.8%.

源语言英语
主期刊名Neural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
编辑Teddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, Achmad Nizar Hidayanto
出版商Springer Science and Business Media Deutschland GmbH
237-249
页数13
ISBN(印刷版)9783030922726
DOI
出版状态已出版 - 2021
活动28th International Conference on Neural Information Processing, ICONIP 2021 - Virtual, Online
期限: 8 12月 202112 12月 2021

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13111 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议28th International Conference on Neural Information Processing, ICONIP 2021
Virtual, Online
时期8/12/2112/12/21

指纹

探究 'Self-supervised Compressed Video Action Recognition via Temporal-Consistent Sampling' 的科研主题。它们共同构成独一无二的指纹。

引用此