Skip to main navigation Skip to search Skip to main content

An efficient semi-supervised clustering algorithm with sequential constraints

  • Jinfeng Yi
  • , Lijun Zhang
  • , Tianbao Yang
  • , Wei Liu
  • , Jun Wang
  • IBM
  • Nanjing University
  • University of Iowa
  • Alibaba Group Holding Ltd.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Semi-supervised clustering leverages side information such as pairwise constraints to guide clustering procedures. Despite promising progress, existing semi-supervised clustering approaches overlook the condition of side information being generated sequentially, which is a natural setting arising in numerous real-world applications such as social network and e-commerce system analysis. Given emerged new constraints, classical semi-supervised clustering algorithms need to re-optimize their objectives over all data samples and constraints in availability, which prevents them from efficiently updating the obtained data partitions. To address this challenge, we propose an efficient dynamic semi-supervised clustering framework that casts the clustering problem into a search problem over a feasible convex set, i.e., a convex hull with its extreme points being an ensemble of m data partitions. According to the principle of ensemble clustering, the optimal partition lies in the convex hull, and can thus be uniquely represented by an m-dimensional probability simplex vector. As such, the dynamic semi-supervised clustering problem is simplified to the problem of updating a probability simplex vector subject to the newly received pairwise constraints. We then develop a computationally efficient updating procedure to update the probability simplex vector in O(m2) time, irrespective of the data size n. Our empirical studies on several real-world benchmark datasets show that the proposed algorithm outperforms the state-of-the-art semi-supervised clustering algorithms with visible performance gain and significantly reduced running time.

Original languageEnglish
Title of host publicationKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1405-1414
Number of pages10
ISBN (Electronic)9781450336642
DOIs
StatePublished - 10 Aug 2015
Externally publishedYes
Event21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, Australia
Duration: 10 Aug 201513 Aug 2015

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2015-August

Conference

Conference21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Country/TerritoryAustralia
CitySydney
Period10/08/1513/08/15

Keywords

  • Convex hull
  • Probability simplex
  • Semi-supervised clustering
  • Sequential constraints

Fingerprint

Dive into the research topics of 'An efficient semi-supervised clustering algorithm with sequential constraints'. Together they form a unique fingerprint.

Cite this