z-logo
Premium
cnnAlpha : Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks
Author(s) -
Oberti Mauricio,
Vaisman Iosif I.
Publication year - 2020
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.25966
Subject(s) - convolutional neural network , computer science , artificial intelligence , sequence (biology) , reduction (mathematics) , pattern recognition (psychology) , class (philosophy) , intrinsically disordered proteins , proteome , machine learning , computational biology , data mining , bioinformatics , chemistry , mathematics , biology , biochemistry , geometry
Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence‐only prediction method—which tries to overcome the challenge of accurate prediction posed by IDRs—based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3‐letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state‐of‐the‐art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome‐wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here