z-logo
open-access-imgOpen Access
Topological Data Analysis In Text Classification Based On Word Embedding And TF-IDF
Author(s) -
Xuezhi Wen
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1634/1/012039
Subject(s) - topological data analysis , computer science , topology (electrical circuits) , set (abstract data type) , word (group theory) , data mining , data set , embedding , mathematics , artificial intelligence , algorithm , combinatorics , geometry , programming language
As a fresh and rapidly-developing method in data science, topological data analysis (TDA) offers a new set of ways to look at data and derive features out of high-dimensional models with topological and geometric tools. In this paper, the author briefly introduces the topological concepts that are involved several researches, then compares and examines different methods of extraction of topological features from the texts. The result shows that these topological tools provide some additional features of the document that are not detected by using the original methods. In the experiment, adding these topological features to the usual text mining tools results in improvement of prediction accuracy (as much as 5%). However, as expected, these topological features alone are not sufficient to classify text documents. Future experiments and discussions need to be conducted to determine whether these methods could be combined to make better classifications.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here