Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach
Author(s) -
Khader Shameer,
Ganesan Pugalenthi,
Krishna Kumar Kandaswamy,
Ponnuthurai Nagaratnam Suganthan,
Govindaraju Archunan,
Ramanathan Sowdhamini
Publication year - 2010
Publication title -
bioinformatics and biology insights
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.556
H-Index - 23
ISSN - 1177-9322
DOI - 10.4137/bbi.s4464
Subject(s) - support vector machine , computational biology , mechanism (biology) , classifier (uml) , sequence (biology) , protein domain , computer science , domain (mathematical analysis) , feature selection , protein sequencing , structural classification of proteins database , protein structure , bioinformatics , artificial intelligence , biology , peptide sequence , genetics , gene , physics , biochemistry , mathematics , mathematical analysis , quantum mechanics
3-dimensional domain swapping is a mechanism where two or more protein molecules form higher order oligomers by exchanging identical or similar subunits. Recently, this phenomenon has received much attention in the context of prions and neurodegenerative diseases, due to its role in the functional regulation, formation of higher oligomers, protein misfolding, aggregation etc. While 3-dimensional domain swap mechanism can be detected from three-dimensional structures, it remains a formidable challenge to derive common sequence or structural patterns from proteins involved in swapping. We have developed a SVM-based classifier to predict domain swapping events using a set of features derived from sequence and structural data. The SVM classifier was trained on features derived from 150 proteins reported to be involved in 3D domain swapping and 150 proteins not known to be involved in swapped conformation or related to proteins involved in swapping phenomenon. The testing was performed using 63 proteins from the positive dataset and 63 proteins from the negative dataset. We obtained 76.33% accuracy from training and 73.81% accuracy from testing. Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease. Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom