z-logo
open-access-imgOpen Access
Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach
Author(s) -
Camilo L. M. Morais,
Marfran C. D. Santos,
Kássio M. G. Lima,
Francis L. Martin
Publication year - 2019
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btz421
Subject(s) - algorithm , principal component analysis , linear discriminant analysis , computer science , set (abstract data type) , euclidean distance , data set , sample (material) , data mining , artificial intelligence , pattern recognition (psychology) , mathematics , chemistry , chromatography , programming language
Data splitting is a fundamental step for building classification models with spectral data, especially in biomedical applications. This approach is performed following pre-processing and prior to model construction, and consists of dividing the samples into at least training and test sets; herein, the training set is used for model construction and the test set for model validation. Some of the most-used methodologies for data splitting are the random selection (RS) and the Kennard-Stone (KS) algorithms; here, the former works based on a random splitting process and the latter is based on the calculation of the Euclidian distance between the samples. We propose an algorithm called the Morais-Lima-Martin (MLM) algorithm, as an alternative method to improve data splitting in classification models. MLM is a modification of KS algorithm by adding a random-mutation factor.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom