RSDB: representative protein sequence databases have high information content | Zendy

Jong-Eun Park | Zendy; Liisa Holm | Zendy; Andreas Heger | Zendy; C. Chothia | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

RSDB: representative protein sequence databases have high information content

Author(s) -

Jong-Eun Park,

Liisa Holm,

Andreas Heger,

C. Chothia

Publication year - 2000

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/16.5.458

Subject(s) - database , computer science , sequence database , sequence (biology) , homology (biology) , information retrieval , biological database , granularity , sequence homology , data mining , bioinformatics , biology , gene , peptide sequence , genetics , programming language

Biological sequence databases are highly redundant for two main reasons: 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences often have high sequence identities due to gene duplication. We wanted to know how many sequences can be removed before the databases start losing homology information. Can a database of sequences with mutual sequence identity of 50% or less provide us with the same amount of biological information as the original full database?

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research