Reducing unknown unknowns with guidance in image caption

  • Mengjun Ni*
  • , Jing Yang
  • , Xin Lin
  • , Liang He
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Deep recurrent models applied in Image Caption, which link up computer vision and natural language processing, have achieved excellent results enabling automatically generating natural sentences describing an image. However, the mismatch of sample distribution between training data and the open world may leads to tons of hiding-in-dark Unknown Unknowns (UUs). And such errors may greatly harm the correctness of generated captions. In this paper, we present a framework targeting on UUs reduction and model optimization based on recurrently training with small amounts of external data detected under assistance of crowd commonsense. We demonstrate and analyze our method with currently state-of-the-art image-to-text model. Aiming at reducing the number of UUs in generated captions, we obtain over 12% of UUs reduction and reinforcement of model cognition on these scenes.

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2017 - 26th International Conference on Artificial Neural Networks, Proceedings
EditorsAlessandra Lintas, Alessandro E. Villa, Stefano Rovetta, Paul F. Verschure
PublisherSpringer Verlag
Pages547-555
Number of pages9
ISBN (Print)9783319686110
DOIs
StatePublished - 2017
Event26th International Conference on Artificial Neural Networks, ICANN 2017 - Alghero, Italy
Duration: 11 Sep 201714 Sep 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10614 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Artificial Neural Networks, ICANN 2017
Country/TerritoryItaly
CityAlghero
Period11/09/1714/09/17

Keywords

  • Commonsense
  • Crowdsourcing
  • Image caption
  • Recurrent neural network

Fingerprint

Dive into the research topics of 'Reducing unknown unknowns with guidance in image caption'. Together they form a unique fingerprint.

Cite this