TESE: generating specific protein structure test set ensembles
Author(s) -
Francesco G. Sirocco,
Silvio C. E. Tosatto
Publication year - 2008
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btn488
Subject(s) - computer science , redundancy (engineering) , web server , benchmarking , protein data bank (rcsb pdb) , data mining , sequence (biology) , file format , protein structure , database , operating system , biology , the internet , genetics , biochemistry , marketing , business
TESE is a web server for the generation of test sets of protein sequences and structures fulfilling a number of different criteria. At least three different use cases can be envisaged: (i) benchmarking of novel methods; (ii) test sets tailored for special needs and (iii) extending available datasets. The CATH structure classification is used to control structural/sequence redundancy and a variety of structural quality parameters can be used to interactively select protein subsets with specific characteristics, e.g. all X-ray structures of alpha-helical repeat proteins with more than 120 residues and resolution <2.0 A. The output includes FASTA-formatted sequences, PDB files and a clickable HTML index file containing images of the selected proteins. Multiple subsets for cross-validation are also supported.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom