Abstract
The widely adopted K-means clustering algorithm uses a sum of squared error objective function. A detailed analysis shows the close relationship between K-means clustering and principal component analysis (PCA) which is extensively utilized in unsupervised dimension reduction. We prove that the continuous solutions of the discrete K-means clustering membership indicators are the data projections on the principal directions (principal eigenvectors of the covariance matrix). New lower bounds for K-means objective function are derived, which relate directly to the eigenvalues of the covariance matrix. Experiments on Internet newsgroups indicate that the new bounds are within 0.5-1.5% of the optimal values, and that PCA provides an effective solution for the K-means clustering.
| Original language | English |
|---|---|
| Pages | 497-501 |
| Number of pages | 5 |
| DOIs | |
| State | Published - 2004 |
| Externally published | Yes |
| Event | Proceedings of the Fourth SIAM International Conference on Data Mining - Lake Buena Vista, FL, United States Duration: 22 Apr 2004 → 24 Apr 2004 |
Conference
| Conference | Proceedings of the Fourth SIAM International Conference on Data Mining |
|---|---|
| Country/Territory | United States |
| City | Lake Buena Vista, FL |
| Period | 22/04/04 → 24/04/04 |