TY - GEN
T1 - A hybrid framework for product normalization n online shopping
AU - Wang, Li
AU - Zhang, Rong
AU - Sha, Chaofeng
AU - He, Xiaofeng
AU - Zhou, Aoying
PY - 2013
Y1 - 2013
N2 - The explosive growth of products in both variety and quantity is an obvious evidence for the booming of C2C (Customer-to-Customer) E-commerce. Product normalization, which determines whether products are referring to the same underlying entity, is a fundamental task of data management in C2C market. However, product normalization in C2C market is challenging because the data is noisy and lacks a uniform schema. In this paper, we propose a hybrid framework, which achieves product normalization by the schema integration and data cleaning. In the framework, a graph-based method was proposed to integrate the schema. The missing data was filled and the incorrect data was repaired by using the evidence extracted from surrounding information, such as the title and textual description. We distinguish products by clustering on the product similarity matrix which is learned through logistic regression. We conduct experiments on the real-world data and the experimental results confirm the effectiveness of our design by comparing with the existing methods.
AB - The explosive growth of products in both variety and quantity is an obvious evidence for the booming of C2C (Customer-to-Customer) E-commerce. Product normalization, which determines whether products are referring to the same underlying entity, is a fundamental task of data management in C2C market. However, product normalization in C2C market is challenging because the data is noisy and lacks a uniform schema. In this paper, we propose a hybrid framework, which achieves product normalization by the schema integration and data cleaning. In the framework, a graph-based method was proposed to integrate the schema. The missing data was filled and the incorrect data was repaired by using the evidence extracted from surrounding information, such as the title and textual description. We distinguish products by clustering on the product similarity matrix which is learned through logistic regression. We conduct experiments on the real-world data and the experimental results confirm the effectiveness of our design by comparing with the existing methods.
UR - https://www.scopus.com/pages/publications/84892885813
U2 - 10.1007/978-3-642-37450-0_28
DO - 10.1007/978-3-642-37450-0_28
M3 - 会议稿件
AN - SCOPUS:84892885813
SN - 9783642374494
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 370
EP - 384
BT - Database Systems for Advanced Applications - 18th International Conference, DASFAA 2013, Proceedings
T2 - 18th International Conference on Database Systems for Advanced Applications, DASFAA 2013
Y2 - 22 April 2013 through 25 April 2013
ER -