Premium
Selection of influential variables in ordinal data with preponderance of zeros
Author(s) -
Das Ujjwal,
Das Kalyan
Publication year - 2021
Publication title -
statistica neerlandica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.52
H-Index - 39
eISSN - 1467-9574
pISSN - 0039-0402
DOI - 10.1111/stan.12225
Subject(s) - ordinal data , ordinal regression , lasso (programming language) , ordered logit , mathematics , logit , logistic regression , feature selection , econometrics , selection (genetic algorithm) , computer science , statistics , elastic net regularization , class (philosophy) , coordinate descent , regression , artificial intelligence , mathematical optimization , world wide web
Presence of excess zero in ordinal data is pervasive in areas like medical and social sciences. Unfortunately, analysis of such kind of data has so far hardly been looked into, perhaps for the reason that the underlying model that fits such data, is not a generalized linear model. Obviously some methodological developments and intensive computations are required. The current investigation is concerned with the selection of variables in such models. In many occasions where the number of predictors is quite large and some of them are not useful, the maximum likelihood approach is not the automatic choice. As, apart from the messy calculations involved, this approach fails to provide efficient estimates of the underlying parameters. The proposed penalized approach includes ℓ 1 penalty (LASSO) and the mixture of ℓ 1 and ℓ 2 penalties (elastic net). We propose a coordinate descent algorithm to fit a wide class of ordinal regression models and select useful variables appearing in both the ordinal regression and the logistic regression based mixing component. A rigorous discussion on the selection of predictors has been made through a simulation study. The proposed method is illustrated by analyzing the severity of driver injury from Michigan upper peninsula road accidents.