Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models | Zendy

Mahdi Roozbeh | Zendy; Saman Babaie–Kafaki | Zendy; Zohre Aminifard | Zendy

Open Access

Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models

Author(s) -

Mahdi Roozbeh,

Saman Babaie–Kafaki,

Zohre Aminifard

Publication year - 2020

Publication title -

journal of industrial and management optimization

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.325

H-Index - 32

eISSN - 1553-166X

pISSN - 1547-5816

DOI - 10.3934/jimo.2020128

Subject(s) - multicollinearity , variance inflation factor , outlier , overfitting , estimator , robust regression , statistics , mathematics , computer science , linear regression , econometrics , mathematical optimization , machine learning , artificial neural network

In classical regression analysis, the ordinary least–squares estimation is the best strategy when the essential assumptions such as normality and independency to the error terms as well as ignorable multicollinearity in the covariates are met. However, if one of these assumptions is violated, then the results may be misleading. Especially, outliers violate the assumption of normally distributed residuals in the least–squares regression. In this situation, robust estimators are widely used because of their lack of sensitivity to outlying data points. Multicollinearity is another common problem in multiple regression models with inappropriate effects on the least–squares estimators. So, it is of great importance to use the estimation methods provided to tackle the mentioned problems. As known, robust regressions are among the popular methods for analyzing the data that are contaminated with outliers. In this guideline, here we suggest two mixed–integer nonlinear optimization models which their solutions can be considered as appropriate estimators when the outliers and multicollinearity simultaneously appear in the data set. Capable to be effectively solved by metaheuristic algorithms, the models are designed based on penalization schemes with the ability of down–weighting or ignoring unusual data and multicollinearity effects. We establish that our models are computationally advantageous in the perspective of the flop count. We also deal with a robust ridge methodology. Finally, three real data sets are analyzed to examine performance of the proposed methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research