UDSMProt: universal deep sequence models for protein classification
Author(s) -
Nils Strodthoff,
Patrick Wagner,
Markus Wenzel,
Wojciech Samek
Publication year - 2020
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btaa003
Subject(s) - computer science , artificial intelligence , source code , machine learning , field (mathematics) , protein sequencing , task (project management) , representation (politics) , sequence (biology) , class (philosophy) , code (set theory) , deep learning , data mining , peptide sequence , programming language , gene , biochemistry , chemistry , genetics , mathematics , management , set (abstract data type) , politics , biology , economics , political science , pure mathematics , law
Inferring the properties of a protein from its amino acid sequence is one of the key problems in bioinformatics. Most state-of-the-art approaches for protein classification are tailored to single classification tasks and rely on handcrafted features, such as position-specific-scoring matrices from expensive database searches. We argue that this level of performance can be reached or even be surpassed by learning a task-agnostic representation once, using self-supervised language modeling, and transferring it to specific tasks by a simple fine-tuning step.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom