Exploring inter-feature and inter-class relationships with deep neural networks for video classification

  • Zuxuan Wu
  • , Yu Gang Jiang
  • , Jun Wang
  • , Jian Pu
  • , Xiangyang Xue

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

131 Scopus citations

Abstract

Videos contain very rich semantics and are intrinsically multimodal. In this paper, we study the challenging task of classifying videos according to their high-level semantics such as human actions or complex events. Although extensive efforts have been paid to study this problem, most existing works combined multiple features using simple fusion strategies and neglected the exploration of inter-class semantic relationships. In this paper, we propose a novel unified framework that jointly learns feature relationships and exploits the class relationships for improved video classification performance. Specifically, these two types of relationships are learned and utilized by rigorously imposing regularizations in a deep neural network (DNN). Such a regularized DNN can be efficiently launched using a GPU implementation with an affordable training cost. Through arming the DNN with better capability of exploring both the interfeature and the inter-class relationships, the proposed regularized DNN is more suitable for identifying video semantics. With extensive experimental evaluations, we demonstrate that the proposed framework exhibits superior performance over several state-of-the-art approaches. On the well-known Hollywood2 and Columbia Consumer Video benchmarks, we obtain to-date the best reported results: 65.7% and 70.6% respectively in terms of mean average precision.

Original languageEnglish
Title of host publicationMM 2014 - Proceedings of the 2014 ACM Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages167-176
Number of pages10
ISBN (Electronic)9781450330633
DOIs
StatePublished - 3 Nov 2014
Externally publishedYes
Event2014 ACM Conference on Multimedia, MM 2014 - Orlando, United States
Duration: 3 Nov 20147 Nov 2014

Publication series

NameMM 2014 - Proceedings of the 2014 ACM Conference on Multimedia

Conference

Conference2014 ACM Conference on Multimedia, MM 2014
Country/TerritoryUnited States
CityOrlando
Period3/11/147/11/14

Keywords

  • Action and event recognition
  • Class relationships
  • Deep neural networks
  • Multimodal features

Fingerprint

Dive into the research topics of 'Exploring inter-feature and inter-class relationships with deep neural networks for video classification'. Together they form a unique fingerprint.

Cite this