z-logo
open-access-imgOpen Access
Structural classification of proteins based on the computationally efficient recurrence quantification analysis and horizontal visibility graphs
Author(s) -
Michaela Areti Zervou,
Effrosyni Doutsi,
Pavlos Pavlidis,
Panagiotis Tsakalides
Publication year - 2021
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btab407
Subject(s) - computer science , benchmark (surveying) , visibility , representation (politics) , data mining , process (computing) , series (stratigraphy) , code (set theory) , function (biology) , class (philosophy) , tree (set theory) , sequence (biology) , algorithm , machine learning , artificial intelligence , mathematics , paleontology , mathematical analysis , physics , geodesy , set (abstract data type) , politics , political science , law , optics , biology , programming language , geography , operating system , genetics , evolutionary biology
Protein structural class prediction is one of the most significant problems in bioinformatics, as it has a prominent role in understanding the function and evolution of proteins. Designing a computationally efficient but at the same time accurate prediction method remains a pressing issue, especially for sequences that we cannot obtain a sufficient amount of homologous information from existing protein sequence databases. Several studies demonstrate the potential of utilizing chaos game representation along with time series analysis tools such as recurrence quantification analysis, complex networks, horizontal visibility graphs (HVG) and others. However, the majority of existing works involve a large amount of features and they require an exhaustive, time consuming search of the optimal parameters. To address the aforementioned problems, this work adopts the generalized multidimensional recurrence quantification analysis (GmdRQA) as an efficient tool that enables to process concurrently a multidimensional time series and reduce the number of features. In addition, two data-driven algorithms, namely average mutual information and false nearest neighbors, are utilized to define in a fast yet precise manner the optimal GmdRQA parameters.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom