A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining | Zendy

Sen Taner Z. | Zendy; Cheng Haitao | Zendy; Kloczkowski Andrzej | Zendy; Jernigan Robert L. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining

Author(s) -

Sen Taner Z.,

Cheng Haitao,

Kloczkowski Andrzej,

Jernigan Robert L.

Publication year - 2006

Publication title -

protein science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.353

H-Index - 175

eISSN - 1469-896X

pISSN - 0961-8368

DOI - 10.1110/ps.062125306

Subject(s) - sequence (biology) , data mining , protein secondary structure , computer science , protein data bank , fragment (logic) , protein structure database , loop modeling , algorithm , bayesian probability , protein structure prediction , mathematics , protein structure , sequence database , artificial intelligence , biology , genetics , biochemistry , gene

The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross‐validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q 3 ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research