z-logo
open-access-imgOpen Access
Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification
Author(s) -
Johra Muhammad Moosa,
Shenheng Guan,
Michael F. Moran,
Bin Ma
Publication year - 2020
Publication title -
journal of proteome research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.644
H-Index - 161
eISSN - 1535-3907
pISSN - 1535-3893
DOI - 10.1021/acs.jproteome.9b00555
Subject(s) - decoy , false discovery rate , de bruijn sequence , computer science , identification (biology) , sequence database , database search engine , mascot , data mining , database , mathematics , biology , information retrieval , search engine , botany , receptor , gene , discrete mathematics , political science , law , biochemistry
The sequence database searching method is widely used in proteomics for peptide identification. To control the false discovery rate (FDR) of the searching results, the target-decoy method generates and searches a decoy database together with the target database. A known problem is that the target protein sequence database may contain numerous repeated peptides. The structures of these repeats are not preserved by most existing decoy generation algorithms. Previous studies suggest that such discrepancy between the target and decoy databases may lead to an inaccurate FDR estimation. Based on the de Bruijn graph model, we propose a new repeat-preserving algorithm to generate decoy databases. We prove that this algorithm preserves the structures of the repeats in the target database to a great extent. The de Bruijn method has been compared with a few other commonly used methods and demonstrated superior FDR estimation accuracy and an improved number of peptide identification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom