跳到主要导航 跳到搜索 跳到主要内容

K-means clustering via principal component analysis

  • Lawrence Berkeley National Laboratory

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-meaas clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation that PCA, via singular value decomposition, provides the best low-dimensional linear approximation of the data. On learning, the result suggests effective techniques for K-means data clustering. DNA gene expression and Internet newsgroups are analyzed to illustrate our results. Experiments indicate that the new bounds are within 0.5-1.5% of the optimal values.

源语言英语
主期刊名Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
编辑R. Greiner, D. Schuurmans
225-232
页数8
出版状态已出版 - 2004
已对外发布
活动Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004 - Banff, Alta, 加拿大
期限: 4 7月 20048 7月 2004

出版系列

姓名Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004

会议

会议Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
国家/地区加拿大
Banff, Alta
时期4/07/048/07/04

指纹

探究 'K-means clustering via principal component analysis' 的科研主题。它们共同构成独一无二的指纹。

引用此