Abstract
Document image classification plays a vital role in the document image processing system. Thus it is of great importance to have a clear understanding of the state-of-the-art of the document image classification field, especially in this deep learning era, which will facilitate the development of effective document image processing systems. In this paper, we provide a comprehensive survey of the progress that has been made in the field of document image classification over the past two decades. We categorize the document images into non-mobile images and mobile images according to the way they are acquired. The existing document image classification methods for these two types of images are reviewed, which are classified as textual-based methods, structural-based methods, visual-based methods and hybrid methods. We further compare the performance of different classification methods on several public benchmark datasets. Finally, we highlight some open issues and recommend promising directions for future research.
| Original language | English |
|---|---|
| Pages (from-to) | 223-240 |
| Number of pages | 18 |
| Journal | Neurocomputing |
| Volume | 453 |
| DOIs | |
| State | Published - 17 Sep 2021 |
Keywords
- Document image classification
- Mobile document images
- Non-mobile document images
- Survey