Novel Automated K-means++ Algorithm for Financial Data Sets | Zendy

Guoyu Du | Zendy; Xuehua Li | Zendy; Lanjie Zhang | Zendy; Libo Liu | Zendy; Chaohua Zhao | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Novel Automated K-means++ Algorithm for Financial Data Sets

Author(s) -

Guoyu Du,

Xuehua Li,

Lanjie Zhang,

Libo Liu,

Chaohua Zhao

Publication year - 2021

Publication title -

mathematical problems in engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.262

H-Index - 62

eISSN - 1026-7077

pISSN - 1024-123X

DOI - 10.1155/2021/5521119

Subject(s) - cluster analysis , algorithm , measure (data warehouse) , computer science , similarity (geometry) , set (abstract data type) , similarity measure , cosine similarity , data mining , field (mathematics) , cluster (spacecraft) , data set , term (time) , mathematics , artificial intelligence , physics , quantum mechanics , pure mathematics , image (mathematics) , programming language

The K-means algorithm has been extensively investigated in the field of text clustering because of its linear time complexity and adaptation to sparse matrix data. However, it has two main problems, namely, the determination of the number of clusters and the location of the initial cluster centres. In this study, we propose an improved K-means++ algorithm based on the Davies-Bouldin index (DBI) and the largest sum of distance called the SDK-means++ algorithm. Firstly, we use the term frequency-inverse document frequency to represent the data set. Secondly, we measure the distance between objects by cosine similarity. Thirdly, the initial cluster centres are selected by comparing the distance to existing initial cluster centres and the maximum density. Fourthly, clustering results are obtained using the K-means++ method. Lastly, DBI is used to obtain optimal clustering results automatically. Experimental results on real bank transaction volume data sets show that the SDK-means++ algorithm is more effective and efficient than two other algorithms in organising large financial text data sets. The F-measure value of the proposed algorithm is 0.97. The running time of the SDK-means++ algorithm is reduced by 42.9% and 22.4% compared with that for K-means and K-means++ algorithms, respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research