z-logo
open-access-imgOpen Access
From Unsupervised Multi-Instance Learning to Identification of Near-Native Protein Structures
Author(s) -
Fardina Fathmiul Alam,
Amarda Shehu
Publication year - 2020
Publication title -
epic series in computing
Language(s) - English
Resource type - Conference proceedings
ISSN - 2398-7340
DOI - 10.29007/pjcf
Subject(s) - computer science , cluster analysis , machine learning , artificial intelligence , unsupervised learning , similarity (geometry) , identification (biology) , parametric statistics , selection (genetic algorithm) , protein structure prediction , data mining , protein structure , mathematics , image (mathematics) , statistics , botany , physics , nuclear magnetic resonance , biology
A major challenge in computational biology regards recognizing one or more biologically- active/native tertiary protein structures among thousands of physically-realistic structures generated via template-free protein structure prediction algorithms. Clustering structures based on structural similarity remains a popular approach. However, clustering orga- nizes structures into groups and does not directly provide a mechanism to select individual structures for prediction. In this paper, we provide a few algorithms for this selection prob- lem. We approach the problem under unsupervised multi-instance learning and address it in three stages, first organizing structures into bags, identifying relevant bags, and then drawing individual structures/instances from these bags. We present both non-parametric and parametric algorithms for drawing individual instances. In the latter, parameters are trained over training data and evaluated over testing data via rigorous metrics.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here