Biological sequence classification utilizing positive and unlabeled data
Author(s) -
Yuanyuan Xiao,
Mark R. Segal
Publication year - 2008
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btn089
Subject(s) - computer science , context (archaeology) , class (philosophy) , cluster analysis , set (abstract data type) , data mining , artificial intelligence , data set , sequence (biology) , identification (biology) , property (philosophy) , pattern recognition (psychology) , machine learning , paleontology , philosophy , genetics , botany , epistemology , biology , programming language
In the genomics setting, an increasingly common data configuration consists of a small set of sequences possessing a targeted property (positive instances) amongst a large set of sequences for which class membership is unknown (unlabeled instances). Traditional two-class classification methods do not effectively handle such data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom