Premium
Can statistical learning models make early selection among sugarcane families easier and still efficient?
Author(s) -
Moreira Édimo Fernando Alves,
Barbosa Marcio Henrique Pereira,
Peternelli Luiz Alexandre
Publication year - 2020
Publication title -
crop science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.76
H-Index - 147
eISSN - 1435-0653
pISSN - 0011-183X
DOI - 10.1002/csc2.20334
Subject(s) - hectare , selection (genetic algorithm) , support vector machine , artificial neural network , artificial intelligence , random forest , machine learning , saccharum , logistic regression , statistics , biology , yield (engineering) , mathematics , computer science , agronomy , ecology , materials science , metallurgy , agriculture
The selection of genotypes at the early stages is one of the main challenges facing sugarcane ( Saccharum officinarum L.) breeding programs. The present work aimed to compare classification techniques, namely, logistic regression (LR), k ‐nearest neighbor (KNN), random forests (RF), and support vector machine (SVM) against the selection among families of sugarcane via artificial neural networks (ANN) and via a procrefers to the families incorrectly selected byedure based on the weighing of the plots. The data used in this work were obtained from 110 families. In the families, the number of stalks (NS), stalk diameter (SD), and stalk height (SH) were collected, in addition to the actual yield, expressed in tons of cane per hectare (TCH). We considered the NS, SD, and SH as explanatory variables for the training of the classifiers. The response used was the indicator Y = 0 if the family is not selected via TCH or Y = 1 otherwise. To increase the efficiency in training, we produced synthetic data based on the simulation of NS, SD, SH, and TCH values. Two models were also considered: a full model with all the predictors and a reduced model without the SH. We used the apparent error rate (AER) and the true positive rate (TPR) for the evaluation of the classifiers. All classifiers present low values for the AER and high values for the TPR in both models. The best performance was observed in the SVM. The reduced model should be preferred, since its performance is very close to that of the full model and its operation is more straightforward.