SPMM: A soft piecewise mapping model for bilingual lexicon induction

  • Yan Fan
  • , Chengyu Wang
  • , Boxing Chen
  • , Zhongkai Hu
  • , Xiaofeng He*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Bilingual Lexicon Induction (BLI) aims at inducing word translations in two distinct languages. The generated bilingual dictionaries via BLI are essential for cross-lingual NLP applications. Most existing methods assume that a mapping matrix can be learned to project the embedding of a word in the source language to that of a word in the target language which shares the same meaning. However, a single matrix may not be able to provide sufficiently large parameter space and to tailor to the semantics of words across different domains and topics due to the complicated nature of linguistic regularities. In this paper, we propose a Soft Piecewise Mapping Model (SPMM). It generates word alignments in two languages by learning multiple mapping matrices with orthogonal constraint. Each matrix encodes the embedding translation knowledge over a distribution of latent topics in the embedding spaces. Such learning problem can be formulated as an extended version of the Wahba’s problem, with a closed-form solution derived. To address the limited size of training data for low-resourced languages and emerging domains, an iterative boosting method based on SPMM is used to augment training dictionaries. Experiments conducted on both general and domain-specific corpora show that SPMM is effective and outperforms previous methods.

Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining, SDM 2019
PublisherSociety for Industrial and Applied Mathematics Publications
Pages244-252
Number of pages9
ISBN (Electronic)9781611975673
DOIs
StatePublished - 2019
Event19th SIAM International Conference on Data Mining, SDM 2019 - Calgary, Canada
Duration: 2 May 20194 May 2019

Publication series

NameSIAM International Conference on Data Mining, SDM 2019

Conference

Conference19th SIAM International Conference on Data Mining, SDM 2019
Country/TerritoryCanada
CityCalgary
Period2/05/194/05/19

Keywords

  • Bilingual lexicon induction
  • Iterative boosting
  • Soft piecewise mapping
  • Wahba’s problem

Fingerprint

Dive into the research topics of 'SPMM: A soft piecewise mapping model for bilingual lexicon induction'. Together they form a unique fingerprint.

Cite this