A Code Classification Method Based on TF-IDF | Zendy

Ke Wang | Zendy; JianHong Jiang | Zendy; Rui-Yun MA | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Code Classification Method Based on TF-IDF

Author(s) -

Ke Wang,

JianHong Jiang,

Rui-Yun MA

Publication year - 2018

Publication title -

destech transactions on economics business and management

Language(s) - English

Resource type - Journals

ISSN - 2475-8868

DOI - 10.12783/dtem/eced2018/23926

Subject(s) - cosine similarity , computer science , cluster analysis , code (set theory) , similarity (geometry) , tf–idf , set (abstract data type) , data mining , document clustering , pattern recognition (psychology) , cluster (spacecraft) , information retrieval , feature (linguistics) , artificial intelligence , programming language , linguistics , philosophy , physics , quantum mechanics , term (time) , image (mathematics)

The main purpose of the study is to find the code with similar possibilities to effectively avoid the adverse effects of code duplication. Through the clustering pretreatment of document feature information, to extract the relevant features of the document. Then the basic characteristics are used to cluster the document, to find out the best number of clusters. According to the reasonable number of clusters that have been found, using the vectors that generated through TF-IDF method, combined the K-means clustering algorithm to distinguish the contents of the files, as well as the introduction of cosine similarity, to determine the similarity of two texts and classify the parallel documents. From the test data set, the method can accurately find the code with the possibility of duplication and works quiet well.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research