跳到主要导航 跳到搜索 跳到主要内容

An efficient semi-supervised clustering algorithm with sequential constraints

  • Jinfeng Yi
  • , Lijun Zhang
  • , Tianbao Yang
  • , Wei Liu
  • , Jun Wang
  • IBM
  • Nanjing University
  • University of Iowa
  • Alibaba Group Holding Ltd.

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Semi-supervised clustering leverages side information such as pairwise constraints to guide clustering procedures. Despite promising progress, existing semi-supervised clustering approaches overlook the condition of side information being generated sequentially, which is a natural setting arising in numerous real-world applications such as social network and e-commerce system analysis. Given emerged new constraints, classical semi-supervised clustering algorithms need to re-optimize their objectives over all data samples and constraints in availability, which prevents them from efficiently updating the obtained data partitions. To address this challenge, we propose an efficient dynamic semi-supervised clustering framework that casts the clustering problem into a search problem over a feasible convex set, i.e., a convex hull with its extreme points being an ensemble of m data partitions. According to the principle of ensemble clustering, the optimal partition lies in the convex hull, and can thus be uniquely represented by an m-dimensional probability simplex vector. As such, the dynamic semi-supervised clustering problem is simplified to the problem of updating a probability simplex vector subject to the newly received pairwise constraints. We then develop a computationally efficient updating procedure to update the probability simplex vector in O(m2) time, irrespective of the data size n. Our empirical studies on several real-world benchmark datasets show that the proposed algorithm outperforms the state-of-the-art semi-supervised clustering algorithms with visible performance gain and significantly reduced running time.

源语言英语
主期刊名KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
出版商Association for Computing Machinery
1405-1414
页数10
ISBN(电子版)9781450336642
DOI
出版状态已出版 - 10 8月 2015
已对外发布
活动21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, 澳大利亚
期限: 10 8月 201513 8月 2015

出版系列

姓名Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
2015-August

会议

会议21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
国家/地区澳大利亚
Sydney
时期10/08/1513/08/15

指纹

探究 'An efficient semi-supervised clustering algorithm with sequential constraints' 的科研主题。它们共同构成独一无二的指纹。

引用此