Data service generation framework from heterogeneous printed forms using semantic link discovery

  • Han Yu
  • , Hongming Cai*
  • , Jun Zhou
  • , Lihong Jiang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Printed forms contain rich information in business process and daily life. However, tremendous heterogeneous printed forms containing same categories of information are difficult to manage and share, which lead to massive data in printed forms remaining waste. To automatically integrate and share these data remarkably improves the efficiency of enterprises, the key problem is how to extract heterogeneous data in printed forms and integrate them for quick use. To solve this issue, we propose a framework that discovers semantic links in printed forms and generates data services for easy data management and rapid data sharing in the enterprise systems. First, a multiple-OCR-based form recognition approach is proposed to make forms computer-readable. Next, forms are modeled into semi-structured data using structure-based semantic link discovery and refining with massive data. Then, a linked data model is built by table matching to align data. Finally, data services are generated based on the linked data model. A series of experiments on printed resumes are conducted, and the results illustrate our framework performs well in recognition rate, link discovery accuracy, data compression ratio and data resource accuracy. A prototype system is presented to illustrate the feasibility of the proposed framework.

Original languageEnglish
Pages (from-to)514-527
Number of pages14
JournalFuture Generation Computer Systems
Volume79
DOIs
StatePublished - Feb 2018
Externally publishedYes

Keywords

  • Data service generation
  • Form recognition
  • Heterogeneous data integration
  • Semantic data model
  • Table matching

Fingerprint

Dive into the research topics of 'Data service generation framework from heterogeneous printed forms using semantic link discovery'. Together they form a unique fingerprint.

Cite this