跳到主要导航 跳到搜索 跳到主要内容

From categorical to numerical: Multiple transitive distance learning and embedding

  • Kai Zhang
  • , Qiaojun Wang
  • , Zhengzhang Chen
  • , Ivan Marsic
  • , Vipin Kumar
  • , Guofei Jiang
  • , Jie Zhang
  • Southwest Petroleum University China
  • Alibaba Group Holding Ltd.
  • Rutgers - The State University of New Jersey, New Brunswick
  • NEC Corporation
  • Northwestern University
  • University of Minnesota Twin Cities
  • Fudan University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Categorical data are ubiquitous in real-world databases. However, due to the lack of an intrinsic proximity measure, many powerful algorithms for numerical data analysis may not work well on their categorical counterparts, making it a bottleneck in practical applications. In this paper, we propose a novel method to transform categorical data to numerical representations, so that abundant numerical learning methods can be exploited in categorical data mining. Our key idea is to learn a pairwise dissimilarity among categorical symbol-s, henceforth a continuous embedding, which can then be used for subsequent numerical treatment. There are two important criteria for learning the dissimilarities. First, it should capture the important "transitivity" which has shown to be particularly useful in measuring the proximity relation in categorical data. Second, the pairwise sample geometry arising from the learned symbol distances should be maximally consistent with prior knowledge (e.g., class labels) to obtain a good generalization performance. We achieve them through multiple transitive distance learning and embedding. Encouraging results are observed on a number of benchmark classification tasks against state-of-the-art.

源语言英语
主期刊名SIAM International Conference on Data Mining 2015, SDM 2015
编辑Suresh Venkatasubramanian, Jieping Ye
出版商Society for Industrial and Applied Mathematics Publications
46-54
页数9
ISBN(电子版)9781510811522
DOI
出版状态已出版 - 2015
已对外发布
活动SIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, 加拿大
期限: 30 4月 20152 5月 2015

出版系列

姓名SIAM International Conference on Data Mining 2015, SDM 2015

会议

会议SIAM International Conference on Data Mining 2015, SDM 2015
国家/地区加拿大
Vancouver
时期30/04/152/05/15

指纹

探究 'From categorical to numerical: Multiple transitive distance learning and embedding' 的科研主题。它们共同构成独一无二的指纹。

引用此