Premium
Multicollinearity in Path Analysis: A Simple Method to Reduce Its Effects
Author(s) -
Olivoto Tiago,
Souza Velci Q.,
Nardino Maicon,
Carvalho Ivan R.,
Ferrari Maurício,
Pelegrin Alan J.,
Szareski Vinícius J.,
Schmidt Denise
Publication year - 2017
Publication title -
agronomy journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.752
H-Index - 131
eISSN - 1435-0645
pISSN - 0002-1962
DOI - 10.2134/agronj2016.04.0196
Subject(s) - multicollinearity , variance inflation factor , statistics , mathematics , path analysis (statistics) , path coefficient , linear regression , regression analysis , correlation , econometrics , geometry
The multicollinearity in path analysis was investigated in different scenarios. A biometrical approach identified the multicollinearity‐generating traits. Data derived from averages overestimated the correlation coefficients. The use of all sampled observations increased the accuracy in path analysis. A simple sample tracking method that reduces multicollinearity is proposed.Some data arrangement methods often used may mask correlation coefficients among explanatory traits, increasing multicollinearity in multiple regression analysis. This study was performed to determine if the harmful effects of multicollinearity might be reduced in the estimation of the X ′ X correlation matrix among explanatory traits. For this, data on 45 treatments (15 maize [ Zea mays L.] hybrids sown in three places) were used. Three path analysis methods (traditional, with k inclusion, and traditional with trait exclusion) were tested in two scenarios: with X ′ X matrix estimated with all sampled observations (ASO, n = 900) and with the X ′ X matrix estimated with the average values of each plot (AVP, n = 180). The condition number (CN) was reduced from 3395 to 2004 when the matrix was estimated with all observations. On average, the factors that inflate the variance of regression coefficients were increased by 61% in the AVP scenario. The addition of the k coefficient reduced the CN to 85.40 and 51.17 for the ASO and AVP scenarios, respectively. Exclusion of multicollinearity‐generating traits was more effective in the ASO than the AVP scenario, resulting in CNs of 29.62 and 63.66, respectively. The largest coefficient of determination (0.977) and the smallest noise (0.150) were obtained in the ASO scenario after the exclusion of the multicollinearity‐generating traits. The use of all sampled observations does not mask the individual variances and reduces the magnitude of the correlations among explanatory traits in 90% of cases, improving the accuracy of biological studies involving path analysis.