Premium
Decision tree classification of proteins identified by mass spectrometry of blood serum samples from people with and without lung cancer
Author(s) -
Markey Mia K.,
Tourassi Georgia D.,
Floyd Carey E.
Publication year - 2003
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.200300521
Subject(s) - cart , decision tree , mass spectrometry , receiver operating characteristic , lung cancer , regression analysis , cross validation , tree (set theory) , regression , biology , computational biology , chromatography , chemistry , pathology , statistics , medicine , artificial intelligence , mathematics , computer science , geography , mathematical analysis , archaeology
A classification and regression tree (CART) model was trained to classify 41 clinical specimens as disease/nondisease based on 26 variables computed from the mass‐to‐charge ratio ( m/z ) and peak heights of proteins identified by mass spectroscopy. The CART model built on all of the specimens (no cross‐validation) had an error rate of 4/41 = 10%. The CART model suggests that mass spectra peaks in the 8000–10 000, 20 000–30 000, 45 000–60 000, and >125 000 m/z ranges may be valuable in distinguishing between the disease/nondisease specimens. The area under the receiver operating characteristics curve was 0.80 ± 0.07 for leave‐one‐out cross‐validation.