z-logo
Premium
The importance of larger data sets for protein secondary structure prediction with neural networks
Author(s) -
Chandonia JohnMarc,
Karplus Martin
Publication year - 1996
Publication title -
protein science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.353
H-Index - 175
eISSN - 1469-896X
pISSN - 0961-8368
DOI - 10.1002/pro.5560050422
Subject(s) - artificial neural network , conjugate gradient method , gradient descent , protein secondary structure , sequence (biology) , computer science , algorithm , class (philosophy) , data mining , artificial intelligence , chemistry , biochemistry
A neural network algorithm is applied to secondary structure and structural class prediction for a database of 318 nonhomologous protein chains. Significant improvement in accuracy is obtained as compared with performance on smaller databases. A systematic study of the effects of network topology shows that, for the larger database, better results are obtained with more units in the hidden layer. In a 32‐fold cross validated test, secondary structure prediction accuracy is 67.0%, relative to 62.6% obtained previously, without any evolutionary information on the sequence. Introduction of sequence profiles increases this value to 72.9%, suggesting that the two types of information are essentially independent. Tertiary structural class is predicted with 80.2% accuracy, relative to 73.9% obtained previously. The use of a larger database is facilitated by the introduction of a scaled conjugate gradient algorithm for optimizing the neural network. This algorithm is about 10–20 times as fast as the standard steepest descent algorithm.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here