Exploring topics in the field of data science by analyzing wikipedia documents: A preliminary result
Author(s) -
Wang Yanyan,
Joo Soohyung,
Lu Kun
Publication year - 2014
Publication title -
proceedings of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1550-8390
pISSN - 0044-7870
DOI - 10.1002/meet.2014.14505101116
Subject(s) - latent dirichlet allocation , topic model , computer science , field (mathematics) , cluster analysis , hierarchical clustering , data science , information retrieval , principal component analysis , artificial intelligence , mathematics , pure mathematics
In this poster, topics in the field of Data Science were explored from Wikipedia documents based on clustering, principal component analysis (PCA), and topic modeling. As a pilot study, we analyzed part of the dataset of Wikipedia documents to initially identify topics discussed in Data Science. Hierarchical clustering resulted in six clusters of topics while PCA identified eleven dimensions in the Data Science field. In addition, topic modeling based on latent Dirichlet allocation (LDA) produced fifty topics related to Data Science. The researchers plan to further examine hierarchical, structural relationships between topics using structural equation modeling and social network analysis. The findings from this study will be useful to understand what topics are currently discussed in the area of Data Science.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom