z-logo
open-access-imgOpen Access
Identification of SARS-CoV-2 origin: Using Ngrams, principal component analysis and Random Forest algorithm
Author(s) -
Hamoucha El Boujnouni,
Mohamed Rahouti,
Mohamed El Boujnouni
Publication year - 2021
Publication title -
informatics in medicine unlocked
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.44
H-Index - 21
ISSN - 2352-9148
DOI - 10.1016/j.imu.2021.100577
Subject(s) - principal component analysis , dimensionality reduction , random forest , coronavirus , categorization , coronaviridae , covid-19 , identification (biology) , virus classification , artificial intelligence , virus , biology , virology , computer science , computational biology , algorithm , genome , medicine , infectious disease (medical specialty) , disease , genetics , ecology , pathology , gene
COVID-19 is an infectious disease caused by the newly discovered SARS-CoV-2 virus. This virus causes a respiratory tract infection, symptoms include dry cough, fever, tiredness and in more severe cases, breathing difficulty. SARS-CoV-2 is an extremely contagious virus that is spreading rapidly all over the world and the scientific community is working tirelessly to find an effective treatment. This paper aims to determine the origin of this virus by comparing its nucleic acid sequence with all members of the coronaviridae family. This study uses a new approach based on the combination of three powerful techniques which are: Ngrams (For text categorization), Principal Component Analysis (For dimensionality reduction) and Random Forest algorithm (For supervised classification). The experimental results have shown that a large set of SARS-CoV-2 genomes, collected from different locations around the world, present significant similarities to those found in pangolins. This finding confirms some previous results obtained by other methods, which also suggest that pangolins should be considered as possible hosts in the emergence of the new coronavirus.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom