Cluster structure of K-means clustering via principal component analysis

Chris Ding, Xiaofeng He

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

72 Scopus citations

Abstract

K-means clustering is a popular data clustering algorithm. Principal component analysis (PCA) is a widely used statistical technique for dimension reduction. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, with a clear simplex cluster strcuture. Our results prove that PCA-based dimension reductions are particular- lly effective for for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 8th Pacific-Asia Conference, PAKDD 2004, Proceedings
EditorsHonghua Dai, Ramakrishnan Srikant, Chengqi Zhang
PublisherSpringer Verlag
Pages414-418
Number of pages5
ISBN (Print)354022064X, 9783540220640
DOIs
StatePublished - 2004
Externally publishedYes
Event8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2004 - Sydney, Australia
Duration: 26 May 200428 May 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3056
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2004
Country/TerritoryAustralia
CitySydney
Period26/05/0428/05/04

Fingerprint

Dive into the research topics of 'Cluster structure of K-means clustering via principal component analysis'. Together they form a unique fingerprint.

Cite this