K-nearest-neighbor consistency in data clustering: Incorporating local information into global optimization

Chris Ding, Xiaofeng He

Research output: Contribution to conferencePaperpeer-review

94 Scopus citations

Abstract

Nearest neighbor consistency is a central concept in statistical pattern recognition, especially the kNN classification methods and its strong theoretical foundation. In this paper, we extend this concept to data clustering, requiring that for any data point in a cluster, its k-nearest neighbors and mutual nearest neighbors should also be in the same cluster. We study properties of the cluster k-nearest neighbor consistency and propose kNN and kMN consistency enforcing and improving algorithms. Extensive experiments on internet newsgroup datasets using the K-means clustering algorithm with kNN consistency enhancement show that kNN/kMN consistency can be improved significantly (about 100% for 1MN and 1NN consistencies) while the clustering accuracy is improved simultaneously. This indicates the local consistency information helps the global cluster objective function optimization.

Original languageEnglish
Pages584-589
Number of pages6
StatePublished - 2004
Externally publishedYes
EventApplied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing - Nicosia, Cyprus
Duration: 14 Mar 200417 Mar 2004

Conference

ConferenceApplied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing
Country/TerritoryCyprus
CityNicosia
Period14/03/0417/03/04

Fingerprint

Dive into the research topics of 'K-nearest-neighbor consistency in data clustering: Incorporating local information into global optimization'. Together they form a unique fingerprint.

Cite this