
From Unsupervised Multi-Instance Learning to Identification of Near-Native Protein Structures
Author(s) -
Fardina Fathmiul Alam,
Amarda Shehu
Publication year - 2020
Publication title -
epic series in computing
Language(s) - English
Resource type - Conference proceedings
ISSN - 2398-7340
DOI - 10.29007/pjcf
Subject(s) - computer science , cluster analysis , machine learning , artificial intelligence , unsupervised learning , similarity (geometry) , identification (biology) , parametric statistics , selection (genetic algorithm) , protein structure prediction , data mining , protein structure , mathematics , image (mathematics) , statistics , botany , physics , nuclear magnetic resonance , biology
A major challenge in computational biology regards recognizing one or more biologically- active/native tertiary protein structures among thousands of physically-realistic structures generated via template-free protein structure prediction algorithms. Clustering structures based on structural similarity remains a popular approach. However, clustering orga- nizes structures into groups and does not directly provide a mechanism to select individual structures for prediction. In this paper, we provide a few algorithms for this selection prob- lem. We approach the problem under unsupervised multi-instance learning and address it in three stages, first organizing structures into bags, identifying relevant bags, and then drawing individual structures/instances from these bags. We present both non-parametric and parametric algorithms for drawing individual instances. In the latter, parameters are trained over training data and evaluated over testing data via rigorous metrics.