TY - JOUR
T1 - Cross-domain document layout analysis using document style guide
AU - Wu, Xingjiao
AU - Xiao, Luwei
AU - Du, Xiangcheng
AU - Zheng, Yingbin
AU - Li, Xin
AU - Ma, Tianlong
AU - Jin, Cheng
AU - He, Liang
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Document layout analysis (DLA) is a crucial computer vision task that involves partitioning document images into high-level semantic regions such as figures, tables, backgrounds, and texts. Deep learning models for DLA typically require a large amount of labeled data, which can be expensive. Though some researchers use generated data for training, a substantial style gap exists between the generated and target data. Moreover, it is necessary to improve the quality of the generated samples to achieve better control. To address these challenges, we propose a cross-domain DLA framework called DL-DSG, which leverages document-style guidance. DL-DSG comprises three components: the document layout generator (DLG) responsible for generating document element locations, the document element decorator (DED) for filling the elements, and the document style discriminator (DSD) for style guidance. In addition to generating controlled documents, we also focus on bridging the gap between the generated and target samples. To this end, we introduce a novel strategy that transforms document style judgment into the document cross-domain style guidance component. We evaluate the effectiveness of DL-DSG on popular DLA datasets, including PubLayNet, DSSE-200, CS-150, and CDSSE, and demonstrate its superior performance.
AB - Document layout analysis (DLA) is a crucial computer vision task that involves partitioning document images into high-level semantic regions such as figures, tables, backgrounds, and texts. Deep learning models for DLA typically require a large amount of labeled data, which can be expensive. Though some researchers use generated data for training, a substantial style gap exists between the generated and target data. Moreover, it is necessary to improve the quality of the generated samples to achieve better control. To address these challenges, we propose a cross-domain DLA framework called DL-DSG, which leverages document-style guidance. DL-DSG comprises three components: the document layout generator (DLG) responsible for generating document element locations, the document element decorator (DED) for filling the elements, and the document style discriminator (DSD) for style guidance. In addition to generating controlled documents, we also focus on bridging the gap between the generated and target samples. To this end, we introduce a novel strategy that transforms document style judgment into the document cross-domain style guidance component. We evaluate the effectiveness of DL-DSG on popular DLA datasets, including PubLayNet, DSSE-200, CS-150, and CDSSE, and demonstrate its superior performance.
KW - Data generation
KW - Deep learning
KW - Document cross-domain analysis
KW - Document layout analysis
UR - https://www.scopus.com/pages/publications/85182913415
U2 - 10.1016/j.eswa.2023.123039
DO - 10.1016/j.eswa.2023.123039
M3 - 文章
AN - SCOPUS:85182913415
SN - 0957-4174
VL - 245
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 123039
ER -