Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure | Zendy

Garg Aarti | Zendy; Kaur Harpreet | Zendy; Raghava G.P.S. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure

Author(s) -

Garg Aarti,

Kaur Harpreet,

Raghava G.P.S.

Publication year - 2005

Publication title -

proteins: structure, function, and bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.699

H-Index - 191

eISSN - 1097-0134

pISSN - 0887-3585

DOI - 10.1002/prot.20630

Subject(s) - sequence (biology) , artificial neural network , correlation coefficient , computer science , multiple sequence alignment , matthews correlation coefficient , protein secondary structure , algorithm , value (mathematics) , position (finance) , correlation , pearson product moment correlation coefficient , pattern recognition (psychology) , artificial intelligence , data mining , sequence alignment , mathematics , statistics , peptide sequence , machine learning , physics , geometry , biology , support vector machine , genetics , finance , gene , economics , nuclear magnetic resonance

The present study is an attempt to develop a neural network-based method for predicting the real value of solvent accessibility from the sequence using evolutionary information in the form of multiple sequence alignment. In this method, two feed-forward networks with a single hidden layer have been trained with standard back-propagation as a learning algorithm. The Pearson's correlation coefficient increases from 0.53 to 0.63, and mean absolute error decreases from 18.2 to 16% when multiple-sequence alignment obtained from PSI-BLAST is used as input instead of a single sequence. The performance of the method further improves from a correlation coefficient of 0.63 to 0.67 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields a mean absolute error value of 15.2% between the experimental and predicted values, when tested on two different nonhomologous and nonredundant datasets of varying sizes. The method consists of two steps: (1) in the first step, a sequence-to-structure network is trained with the multiple alignment profiles in the form of PSI-BLAST-generated position-specific scoring matrices, and (2) in the second step, the output obtained from the first network and PSIPRED-predicted secondary structure information is used as an input to the second structure-to-structure network. Based on the present study, a server SARpred (http://www.imtech.res.in/raghava/sarpred/) has been developed that predicts the real value of solvent accessibility of residues for a given protein sequence. We have also evaluated the performance of SARpred on 47 proteins used in CASP6 and achieved a correlation coefficient of 0.68 and a MAE of 15.9% between predicted and observed values.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research