A novel voting system for the identification of eukaryotic genome promoters | Zendy

Lin Lei | Zendy; Feng Kong | Zendy; Zhisong He | Zendy; Yi Cai | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A novel voting system for the identification of eukaryotic genome promoters

Author(s) -

Lin Lei,

Feng Kong,

Zhisong He,

Yi Cai

Publication year - 2010

Publication title -

journal of biomedical science and engineering

Language(s) - English

Resource type - Journals

eISSN - 1937-688X

pISSN - 1937-6871

DOI - 10.4236/jbise.2010.37096

Subject(s) - identifier , computer science , promoter , redundancy (engineering) , identification (biology) , genome , computational biology , feature selection , data mining , data redundancy , artificial intelligence , gene , machine learning , pattern recognition (psychology) , genetics , database , biology , computer network , gene expression , botany , operating system

Motivation: Accurate identification and delineation of promoters/TSSs (transcription start sites) is important for improving genome annotation and devising experiments to study and understand transcriptional regulation. Many promoter identifiers are developed for promoter identification. However, each promoter identifier has its own focuses and limitations, and we introduce an integration scheme to combine some identifiers together to gain a better prediction performance. Result: In this contribution, 8 promoter identifiers (Proscan, TSSG, TSSW, FirstEF, eponine, ProSOM, EP3, FPROM) are chosen for the investigation of integration. A feature selection method, called mRMR (Minimum Redundancy Maximum Relevance), is novelly transferred to promoter identifier selection by choosing a group of robust and complementing promoter identifiers. For comparison, four integration methods (SMV, WMV, SMV_IS, WMV_IS), from simple to complex, are developed to process a training dataset with 1400 se- quences and a testing dataset with 378 sequences. As a result, 5 identifiers (FPROM, FirstEF, TSSG, epo- nine, TSSW) are chosen by mRMR, and the integration of them achieves 70.08% and 67.83% correct prediction rates for a training dataset and a testing dataset respectively, which is better than any single identifier in which the best single one only achieves 59.32% and 61.78% for the training dataset and testing dataset respectively

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore