Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach
Author(s) -
Camilo L. M. Morais,
Marfran C. D. Santos,
Kássio M. G. Lima,
Francis L. Martin
Publication year - 2019
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btz421
Subject(s) - algorithm , principal component analysis , linear discriminant analysis , computer science , set (abstract data type) , euclidean distance , data set , sample (material) , data mining , artificial intelligence , pattern recognition (psychology) , mathematics , chemistry , chromatography , programming language
Data splitting is a fundamental step for building classification models with spectral data, especially in biomedical applications. This approach is performed following pre-processing and prior to model construction, and consists of dividing the samples into at least training and test sets; herein, the training set is used for model construction and the test set for model validation. Some of the most-used methodologies for data splitting are the random selection (RS) and the Kennard-Stone (KS) algorithms; here, the former works based on a random splitting process and the latter is based on the calculation of the Euclidian distance between the samples. We propose an algorithm called the Morais-Lima-Martin (MLM) algorithm, as an alternative method to improve data splitting in classification models. MLM is a modification of KS algorithm by adding a random-mutation factor.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom