Document image classification: Progress over two decades

  • Li Liu*
  • , Zhiyu Wang
  • , Taorong Qiu
  • , Qiu Chen
  • , Yue Lu
  • , Ching Y. Suen
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

Document image classification plays a vital role in the document image processing system. Thus it is of great importance to have a clear understanding of the state-of-the-art of the document image classification field, especially in this deep learning era, which will facilitate the development of effective document image processing systems. In this paper, we provide a comprehensive survey of the progress that has been made in the field of document image classification over the past two decades. We categorize the document images into non-mobile images and mobile images according to the way they are acquired. The existing document image classification methods for these two types of images are reviewed, which are classified as textual-based methods, structural-based methods, visual-based methods and hybrid methods. We further compare the performance of different classification methods on several public benchmark datasets. Finally, we highlight some open issues and recommend promising directions for future research.

Original languageEnglish
Pages (from-to)223-240
Number of pages18
JournalNeurocomputing
Volume453
DOIs
StatePublished - 17 Sep 2021

Keywords

  • Document image classification
  • Mobile document images
  • Non-mobile document images
  • Survey

Fingerprint

Dive into the research topics of 'Document image classification: Progress over two decades'. Together they form a unique fingerprint.

Cite this