Using multiple sequence alignment and statistical language model to integrate multiple Chinese address recognition outputs

Shengchang Chen, Shujing Lu, Ying Wen, Yue Lu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Different recognizers may result in different mistakes when they are used to recognize a Chinese address. In this paper, we present a method of combining multiple Chinese address recognition outputs to improve Chinese address recognition accuracy. The method first employs multiple sequence alignment to generate a lattice of candidate hypotheses from multiple different recognizer outputs and then applies statistical language model to choose the maximum likelihood candidate sequence. Taking the maximum as the final decision, the performance of our method is superior, compared to the single recognizers and Miyao's method. The experiments on the address images of real envelopes demonstrate that the proposed method increases the character recognition accuracy rate from 95.80% to 98.38%, with 61.30% error reduction. Furthermore, the corrected sorting rate of an automatic mail sorting system increases from 84.11% to 93.72%.

Original languageEnglish
Title of host publication13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PublisherIEEE Computer Society
Pages151-155
Number of pages5
ISBN (Electronic)9781479918058
DOIs
StatePublished - 20 Nov 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2015-November
ISSN (Print)1520-5363

Conference

Conference13th International Conference on Document Analysis and Recognition, ICDAR 2015
Country/TerritoryFrance
CityNancy
Period23/08/1526/08/15

Keywords

  • minimum edit distance
  • multiple Chinese address recognition outputs
  • multiple sequence alignment
  • statistical language model

Fingerprint

Dive into the research topics of 'Using multiple sequence alignment and statistical language model to integrate multiple Chinese address recognition outputs'. Together they form a unique fingerprint.

Cite this