K-NNDP: K-means algorithm based on nearest neighbor density peak optimization and outlier removal

Jiyong Liao, Xingjiao Wu*, Yaxin Wu, Juelin Shu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

K-means is an unsupervised method for vector quantification derived from signal processing. It is currently used in data mining and knowledge-discovery. The advantages of K-means include its simple operation, scalability, and suitability for processing large-scale datasets. However, K-means randomly selects the initial cluster center, which causes unstable clustering results, and outliers affect algorithm performance. To address this challenge, we propose a nearest-neighbor density peak (NNDP)-optimized initial cluster center and outlier removal algorithm. To solve the problem of randomly selecting the initial cluster center, we propose NNDP-based K-means (K-NNDP). K-NNDP automatically selects the initial cluster centers based on decision values, ensuring stable algorithm operation. In addition, we adopt a local search strategy to eliminate outliers, identify outliers using a set threshold, and use the median instead of the mean in subsequent centroid iterations to reduce the impact of outliers on the algorithm. It is worth mentioning that, to date, most previous studies have addressed the two problems independently, which makes it easy for the algorithm to fall into a local optimal solution. Therefore, we innovatively combine these two problems using K-nearest neighbor modeling. To evaluate the effectiveness of K-NNDP, we conducted comparative experiments on several synthetic and real-world datasets. K-NNDP outperformed two classical algorithms and six state-of-the-art improved K-means algorithms. The results prove that K-NNDP can effectively solve the problems of randomness and outlier influence of K-means, and the effect is significant.

Original languageEnglish
Article number111742
JournalKnowledge-Based Systems
Volume294
DOIs
StatePublished - 21 Jun 2024

Keywords

  • Clustering
  • Initial cluster center
  • K-means algorithm
  • Nearest neighbor density peak
  • Outlier detection

Fingerprint

Dive into the research topics of 'K-NNDP: K-means algorithm based on nearest neighbor density peak optimization and outlier removal'. Together they form a unique fingerprint.

Cite this