Use and validation of text mining and cluster algorithms to derive insights from Corona Virus Disease-2019 (COVID-19) medical literature
Author(s) -
Sandeep Reddy,
Ravi Bhaskar,
Sandosh Padmanabhan,
Karin Verspoor,
Chaitanya Mamillapalli,
Rani Lahoti,
VillePetteri Mäkinen,
Smitan Pradhan,
Puru Kushwah,
Saumya Sinha
Publication year - 2021
Publication title -
computer methods and programs in biomedicine update
Language(s) - English
Resource type - Journals
ISSN - 2666-9900
DOI - 10.1016/j.cmpbup.2021.100010
Subject(s) - covid-19 , computer science , cluster (spacecraft) , data science , pandemic , biomedical text mining , disease , data mining , medicine , text mining , infectious disease (medical specialty) , pathology , programming language
The emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) late last year has not only led to the world-wide coronavirus disease 2019 (COVID-19) pandemic but also a deluge of biomedical literature. Following the release of the COVID-19 open research dataset (CORD-19) comprising over 200,000 scholarly articles, we a multi-disciplinary team of data scientists, clinicians, medical researchers and software engineers developed an innovative natural language processing (NLP) platform that combines an advanced search engine with a biomedical named entity recognition extraction package. In particular, the platform was developed to extract information relating to clinical risk factors for COVID-19 by presenting the results in a cluster format to support knowledge discovery. Here we describe the principles behind the development, the model and the results we obtained.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom