z-logo
open-access-imgOpen Access
Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites
Author(s) -
Ying Zhang,
Yan Liu,
Jian Xu,
Xiaoyu Wang,
Xinxin Peng,
Jiangning Song,
DongJun Yu
Publication year - 2021
Publication title -
briefings in bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.204
H-Index - 113
eISSN - 1477-4054
pISSN - 1467-5463
DOI - 10.1093/bib/bbab351
Subject(s) - interpretability , computer science , benchmark (surveying) , artificial intelligence , identification (biology) , machine learning , deep learning , mechanism (biology) , key (lock) , artificial neural network , encode , computational biology , data mining , biology , gene , genetics , philosophy , botany , computer security , geodesy , epistemology , geography
DNA N6-methyladenine is an important type of DNA modification that plays important roles in multiple biological processes. Despite the recent progress in developing DNA 6mA site prediction methods, several challenges remain to be addressed. For example, although the hand-crafted features are interpretable, they contain redundant information that may bias the model training and have a negative impact on the trained model. Furthermore, although deep learning (DL)-based models can perform feature extraction and classification automatically, they lack the interpretability of the crucial features learned by those models. As such, considerable research efforts have been focused on achieving the trade-off between the interpretability and straightforwardness of DL neural networks. In this study, we develop two new DL-based models for improving the prediction of N6-methyladenine sites, termed LA6mA and AL6mA, which use bidirectional long short-term memory to respectively capture the long-range information and self-attention mechanism to extract the key position information from DNA sequences. The performance of the two proposed methods is benchmarked and evaluated on the two model organisms Arabidopsis thaliana and Drosophila melanogaster. On the two benchmark datasets, LA6mA achieves an area under the receiver operating characteristic curve (AUROC) value of 0.962 and 0.966, whereas AL6mA achieves an AUROC value of 0.945 and 0.941, respectively. Moreover, an in-depth analysis of the attention matrix is conducted to interpret the important information, which is hidden in the sequence and relevant for 6mA site prediction. The two novel pipelines developed for DNA 6mA site prediction in this work will facilitate a better understanding of the underlying principle of DL-based DNA methylation site prediction and its future applications.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here