Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method | Zendy

Hasan Zulfiqar | Zendy; Rida Sarwar Khan | Zendy; Farwa Hassan | Zendy; Kyle Hippe | Zendy; Cassandra Hunt | Zendy; Hui Ding | Zendy; Xiaoming Song | Zendy; Renzhi Cao | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method

Author(s) -

Hasan Zulfiqar,

Rida Sarwar Khan,

Farwa Hassan,

Kyle Hippe,

Cassandra Hunt,

Hui Ding,

Xiaoming Song,

Renzhi Cao

Publication year - 2021

Publication title -

mathematical biosciences and engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.451

H-Index - 45

eISSN - 1551-0018

pISSN - 1547-1063

DOI - 10.3934/mbe.2021167

Subject(s) - random forest , computer science , genome , computational biology , coding (social sciences) , source code , artificial intelligence , redundancy (engineering) , feature selection , classifier (uml) , dna , machine learning , biology , genetics , mathematics , gene , programming language , statistics , operating system

N4-methylcytosine (4mC) is a kind of DNA modification which could regulate multiple biological processes. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop an ensemble model to predict 4mC sites in the mouse genome. In the proposed model, DNA sequences were encoded by k-mer , enhanced nucleic acid composition and composition of k -spaced nucleic acid pairs. Subsequently, these features were optimized by using minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) and five-fold cross-validation. The obtained optimal features were inputted into random forest classifier for discriminating 4mC from non-4mC sites in mouse. On the independent dataset, our model could yield the overall accuracy of 85.41%, which was approximately 3.8% -6.3% higher than the two existing models, i4mC-Mouse and 4mCpred-EL respectively. The data and source code of the model can be freely download from https://github.com/linDing-groups/model_4mc.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research