z-logo
open-access-imgOpen Access
A novel voting system for the identification of eukaryotic genome promoters
Author(s) -
Lin Lei,
Feng Kong,
Zhisong He,
Yi Cai
Publication year - 2010
Publication title -
journal of biomedical science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1937-688X
pISSN - 1937-6871
DOI - 10.4236/jbise.2010.37096
Subject(s) - identifier , computer science , promoter , redundancy (engineering) , identification (biology) , genome , computational biology , feature selection , data mining , data redundancy , artificial intelligence , gene , machine learning , pattern recognition (psychology) , genetics , database , biology , computer network , gene expression , botany , operating system
Motivation: Accurate identification and delineation of promoters/TSSs (transcription start sites) is important for improving genome annotation and devising experiments to study and understand transcriptional regulation. Many promoter identifiers are developed for promoter identification. However, each promoter identifier has its own focuses and limitations, and we introduce an integration scheme to combine some identifiers together to gain a better prediction performance. Result: In this contribution, 8 promoter identifiers (Proscan, TSSG, TSSW, FirstEF, eponine, ProSOM, EP3, FPROM) are chosen for the investigation of integration. A feature selection method, called mRMR (Minimum Redundancy Maximum Relevance), is novelly transferred to promoter identifier selection by choosing a group of robust and complementing promoter identifiers. For comparison, four integration methods (SMV, WMV, SMV_IS, WMV_IS), from simple to complex, are developed to process a training dataset with 1400 se- quences and a testing dataset with 378 sequences. As a result, 5 identifiers (FPROM, FirstEF, TSSG, epo- nine, TSSW) are chosen by mRMR, and the integration of them achieves 70.08% and 67.83% correct prediction rates for a training dataset and a testing dataset respectively, which is better than any single identifier in which the best single one only achieves 59.32% and 61.78% for the training dataset and testing dataset respectively

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here