Premium
Resolving confusion of tongues in statistics and machine learning: A primer for biologists and bioinformaticians
Author(s) -
van Iterson Maarten,
van Haagen Herman H. H. B. M.,
Goeman Jelle J.
Publication year - 2012
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.201100395
Subject(s) - terminology , jargon , confusion , computer science , artificial intelligence , field (mathematics) , machine learning , data science , natural language processing , bioinformatics , biology , mathematics , psychology , linguistics , philosophy , psychoanalysis , pure mathematics
Bioinformatics is the field where computational methods from various domains have come together for analysis of biological data. Each domain has introduced its own specific jargon. However, in closely related domains, e.g. machine learning and statistics, concordant and discordant terminology occurs, the later can lead to confusion. This article aims to help solve the confusion of tongues arising from these two closely related domains, which are frequently used in bioinformatics. We provide a short summary of the most commonly applied machine learning and statistical approaches to data analysis in bioinformatics, i.e. classification and statistical hypothesis testing. We explain differences and similarities in common terminology used in various domains, such as precision , recall , sensitivity and true positive rate . This primer can serve as a guide to the terminology used in these fields.