TY - JOUR
T1 - DRFN
T2 - A unified framework for complex document layout analysis
AU - Wu, Xingjiao
AU - Ma, Tianlong
AU - Du, Xiangcheng
AU - Hu, Ziling
AU - Yang, Jing
AU - He, Liang
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/5
Y1 - 2023/5
N2 - Document layout analysis (DLA) plays a vital role in information processing and management. At this stage, the processing of non-Manhattan layout documents has become the bottleneck in implementing the universal document layout analysis framework. To address this challenge, we propose a Complex Document Semantic Structure Extraction non-Manhattan document layout dataset (CDSSE). Furthermore, we design a Dynamic Residual Feature fusion Network (DRFN) to integrate the feature differences between non-Manhattan layouts and Manhattan layouts. During the fusion process, the DRFN makes full use of low-dimensional information and maintains the integrity of high-level semantic information through a Dynamic Residual Fusion Block (DRF). To overcome model overfitting caused by data scarcity, we propose a novel Dynamic Selection Mechanism (DSM). We prove that the DRFN can achieve comparable results on all benchmark datasets. For the Manhattan layout document, F1 reached 89.5% on DSSE-200 and 95.1% on CS-150. For the non-Manhattan layout document, F1 reached 86.8% on CDSSE. In addition, we verified the effectiveness of the model structure. On all datasets, the performance of the model using DRF was significantly improved (DSSE-200: 76.6% vs. 80.3%, CS-150: 91.7% vs. 93.1%, 62.6% vs. 71.8%). The use of the DSM was also significantly improved (DSSE-200: 89.0% vs. 89.5%, CS-150: 94.3% vs. 95.1%, 84.8% vs. 86.8%).
AB - Document layout analysis (DLA) plays a vital role in information processing and management. At this stage, the processing of non-Manhattan layout documents has become the bottleneck in implementing the universal document layout analysis framework. To address this challenge, we propose a Complex Document Semantic Structure Extraction non-Manhattan document layout dataset (CDSSE). Furthermore, we design a Dynamic Residual Feature fusion Network (DRFN) to integrate the feature differences between non-Manhattan layouts and Manhattan layouts. During the fusion process, the DRFN makes full use of low-dimensional information and maintains the integrity of high-level semantic information through a Dynamic Residual Fusion Block (DRF). To overcome model overfitting caused by data scarcity, we propose a novel Dynamic Selection Mechanism (DSM). We prove that the DRFN can achieve comparable results on all benchmark datasets. For the Manhattan layout document, F1 reached 89.5% on DSSE-200 and 95.1% on CS-150. For the non-Manhattan layout document, F1 reached 86.8% on CDSSE. In addition, we verified the effectiveness of the model structure. On all datasets, the performance of the model using DRF was significantly improved (DSSE-200: 76.6% vs. 80.3%, CS-150: 91.7% vs. 93.1%, 62.6% vs. 71.8%). The use of the DSM was also significantly improved (DSSE-200: 89.0% vs. 89.5%, CS-150: 94.3% vs. 95.1%, 84.8% vs. 86.8%).
KW - Deep learning
KW - Dynamic residual feature Fusion
KW - Information extraction
KW - Information understanding
KW - document layout analysis
UR - https://www.scopus.com/pages/publications/85149889655
U2 - 10.1016/j.ipm.2023.103339
DO - 10.1016/j.ipm.2023.103339
M3 - 文章
AN - SCOPUS:85149889655
SN - 0306-4573
VL - 60
JO - Information Processing and Management
JF - Information Processing and Management
IS - 3
M1 - 103339
ER -