Open Access
Using Latent Dirichlet Allocation and Text Mining Techniques for Understanding Medical Literature
Author(s) -
Saadat M. Alhashmi,
Mohammed Maree,
Zaina Saadeddin
Publication year - 2021
Publication title -
computing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.184
H-Index - 11
eISSN - 2312-5381
pISSN - 1727-6209
DOI - 10.47839/ijc.20.4.2437
Subject(s) - latent dirichlet allocation , topic model , computer science , data science , biomedical text mining , cluster analysis , context (archaeology) , field (mathematics) , word embedding , information retrieval , domain (mathematical analysis) , health care , big data , data mining , text mining , artificial intelligence , embedding , mathematics , mathematical analysis , pure mathematics , economics , economic growth , paleontology , biology
Over the past few years, numerous studies and research articles have been published in the medical literature review domain. The topics covered by these researches included medical information retrieval, disease statistics, drug analysis, and many other fields and application domains. In this paper, we employ various text mining and data analysis techniques in an attempt to discover trending topics and topic concordance in the healthcare literature and data mining field. This analysis focuses on healthcare literature and bibliometric data and word association rules applied to 1945 research articles that had been published between the years 2006 and 2019. Our aim in this context is to assist saving time and effort required for manually summarizing large-scale amounts of information in such a broad and multi-disciplinary domain. To carry out this task, we employ topic modeling techniques through the utilization of Latent Dirichlet Allocation (LDA), in addition to various document and word embedding and clustering approaches. Findings reveal that since 2010 the interest in the healthcare big data analysis has increased significantly, as demonstrated by the five most commonly used topics in this domain.