A product normalization method for E-commerce

Li Wang, Rong Zhang, Chao Feng Sha, Xiao Ling Wang, Ao Ying Zhou

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

The booming of E-commerce in terms of product variety and quantity brings new challenges to data management, one of which is Product Normalization. Product normalization is to determine whether products are referring to the same underlying entity. It is a fundamental task of data management in E-commerce, especially for C2C (Customer-to-Customer) model, which can improve search functionality and user's shopping experience. However, Product normalization in E-market is difficult because the data is full of noise and without a uniform schema, making the existed normalization methods inefficient. In this paper, we propose a hybrid framework, which combines product normalization with the schema integration and data cleaning. Firstly, we propose a graph-based method to integrate the schema. Secondly, we fill the missing data and repair the incorrect data by using evidences extracted from product surrounding information, such as the title and textual description. Thirdly, we distinguish products by clustering on the product similarity matrix, which is learned by using linear logistic regression model. Finally, we conduct experiments on a real-world data and the experimental results confirm the effectiveness of our design by comparing with the existing methods.

Original languageEnglish
Pages (from-to)312-325
Number of pages14
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume37
Issue number2
DOIs
StatePublished - Feb 2014

Keywords

  • Clustering
  • Data cleaning
  • E-commerce
  • Entity resolution
  • Logistic regression
  • Schema integration

Fingerprint

Dive into the research topics of 'A product normalization method for E-commerce'. Together they form a unique fingerprint.

Cite this