跳到主要导航 跳到搜索 跳到主要内容

K-NNDP: K-means algorithm based on nearest neighbor density peak optimization and outlier removal

  • Jiyong Liao
  • , Xingjiao Wu*
  • , Yaxin Wu
  • , Juelin Shu
  • *此作品的通讯作者
  • Huaihua University
  • Fudan University
  • School of Transportation Science and Engineering, Harbin Institute of Technology
  • Ltd.

科研成果: 期刊稿件文章同行评审

摘要

K-means is an unsupervised method for vector quantification derived from signal processing. It is currently used in data mining and knowledge-discovery. The advantages of K-means include its simple operation, scalability, and suitability for processing large-scale datasets. However, K-means randomly selects the initial cluster center, which causes unstable clustering results, and outliers affect algorithm performance. To address this challenge, we propose a nearest-neighbor density peak (NNDP)-optimized initial cluster center and outlier removal algorithm. To solve the problem of randomly selecting the initial cluster center, we propose NNDP-based K-means (K-NNDP). K-NNDP automatically selects the initial cluster centers based on decision values, ensuring stable algorithm operation. In addition, we adopt a local search strategy to eliminate outliers, identify outliers using a set threshold, and use the median instead of the mean in subsequent centroid iterations to reduce the impact of outliers on the algorithm. It is worth mentioning that, to date, most previous studies have addressed the two problems independently, which makes it easy for the algorithm to fall into a local optimal solution. Therefore, we innovatively combine these two problems using K-nearest neighbor modeling. To evaluate the effectiveness of K-NNDP, we conducted comparative experiments on several synthetic and real-world datasets. K-NNDP outperformed two classical algorithms and six state-of-the-art improved K-means algorithms. The results prove that K-NNDP can effectively solve the problems of randomness and outlier influence of K-means, and the effect is significant.

源语言英语
文章编号111742
期刊Knowledge-Based Systems
294
DOI
出版状态已出版 - 21 6月 2024

指纹

探究 'K-NNDP: K-means algorithm based on nearest neighbor density peak optimization and outlier removal' 的科研主题。它们共同构成独一无二的指纹。

引用此