
An Enhanced Density Peak Clustering Algorithm with Dimensionality Reduction and Relative Density Normalization for High-Dimensional Duplicate Data
Author(s) -
Xuan Sun,
Xin Liu,
Chunli Deng,
Huiying Chu,
Guiyan Wang,
Hui Zhao
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3596983
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Cluster analysis is a fundamental method for studying big data problems, as it groups samples based on shared features. In cluster analysis, a particular class of big data problems is defined by large sample sizes, high dimensionality, and a substantial presence of duplicate samples. Traditional clustering algorithms often fail to adequately address such challenges, particularly in detecting low-density clusters. To tackle these challenges, this paper presents an enhanced variant of the Density Peaks clustering(DPC) algorithm, referred to as the Dimensionality Reduction and Relative Density Normalization Density Peaks Clustering(DRDN-DPC) algorithm. The DRDN-DPC algorithm incorporates sample count as a weighting factor during the clustering process, thereby refining the density estimation. Additionally, dimensionality reduction techniques are employed to alleviate the adverse effects of high-dimensional data. Moreover, relative density normalization is introduced to improve the detection of low-density clusters, thereby enhancing the overall clustering performance. Following a comprehensive analysis of the computational complexity in both temporal and spatial aspects, the DRDN-DPC is systematically evaluated using 12 standard benchmark datasets and high-dimensional benchmarks with added noise. Its performance is rigorously compared to several clustering algorithms, including the K-means, the DBSCAN, the original DPC algorithm and two modified version DPC, which are KNN-DPC and SNN-DPC. Furthermore, the DRDN-DPC is applied to real-world clustering problems, including molecular clustering in molecular dynamics (MD) simulations and clustering of Electronic Health Records (EHR). The results substantiate the practical efficacy of DRDN-DPC in both domains, highlighting its capability to address complex clustering tasks in diverse application contexts.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom