A Support Vector Machine Classification of Thyroid Bioptic Specimens Using MALDI-MSI Data
Author(s) -
Manuel Galli,
Italo Zoppis,
Gabriele De Sio,
Clizia Chinello,
Fabio Pagni,
Fulvio Magni,
Giancarlo Mauri
Publication year - 2016
Publication title -
advances in bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.33
H-Index - 20
eISSN - 1687-8035
pISSN - 1687-8027
DOI - 10.1155/2016/3791214
Subject(s) - computer science , feature selection , context (archaeology) , support vector machine , artificial intelligence , machine learning , biomarker discovery , medical diagnosis , feature vector , feature (linguistics) , dimension (graph theory) , proteomics , pattern recognition (psychology) , data mining , bioinformatics , pathology , medicine , biology , mathematics , paleontology , biochemistry , linguistics , philosophy , gene , pure mathematics
Biomarkers able to characterise and predict multifactorial diseases are still one of the most important targets for all the “omics” investigations. In this context, Matrix-Assisted Laser Desorption/Ionisation-Mass Spectrometry Imaging (MALDI-MSI) has gained considerable attention in recent years, but it also led to a huge amount of complex data to be elaborated and interpreted. For this reason, computational and machine learning procedures for biomarker discovery are important tools to consider, both to reduce data dimension and to provide predictive markers for specific diseases. For instance, the availability of protein and genetic markers to support thyroid lesion diagnoses would impact deeply on society due to the high presence of undetermined reports (THY3) that are generally treated as malignant patients. In this paper we show how an accurate classification of thyroid bioptic specimens can be obtained through the application of a state-of-the-art machine learning approach (i.e., Support Vector Machines ) on MALDI-MSI data, together with a particular wrapper feature selection algorithm (i.e., recursive feature elimination ). The model is able to provide an accurate discriminatory capability using only 20 out of 144 features, resulting in an increase of the model performances, reliability, and computational efficiency. Finally, tissue areas rather than average proteomic profiles are classified, highlighting potential discriminating areas of clinical interest.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom