Principal component analysis and effective K-means clustering

  • Chris Ding*
  • , Xiaofeng He
  • *Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

23 Scopus citations

Abstract

The widely adopted K-means clustering algorithm uses a sum of squared error objective function. A detailed analysis shows the close relationship between K-means clustering and principal component analysis (PCA) which is extensively utilized in unsupervised dimension reduction. We prove that the continuous solutions of the discrete K-means clustering membership indicators are the data projections on the principal directions (principal eigenvectors of the covariance matrix). New lower bounds for K-means objective function are derived, which relate directly to the eigenvalues of the covariance matrix. Experiments on Internet newsgroups indicate that the new bounds are within 0.5-1.5% of the optimal values, and that PCA provides an effective solution for the K-means clustering.

Original languageEnglish
Pages497-501
Number of pages5
DOIs
StatePublished - 2004
Externally publishedYes
EventProceedings of the Fourth SIAM International Conference on Data Mining - Lake Buena Vista, FL, United States
Duration: 22 Apr 200424 Apr 2004

Conference

ConferenceProceedings of the Fourth SIAM International Conference on Data Mining
Country/TerritoryUnited States
CityLake Buena Vista, FL
Period22/04/0424/04/04

Fingerprint

Dive into the research topics of 'Principal component analysis and effective K-means clustering'. Together they form a unique fingerprint.

Cite this