Premium
A bounded influence regression estimator based on the statistics of the hat matrix
Author(s) -
Chave Alan D.,
Thomson David J.
Publication year - 2003
Publication title -
journal of the royal statistical society: series c (applied statistics)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.205
H-Index - 72
eISSN - 1467-9876
pISSN - 0035-9254
DOI - 10.1111/1467-9876.00406
Subject(s) - estimator , mathematics , statistics , leverage (statistics) , outlier , quantile regression , quantile , bounded function , scatter matrix , robust regression , diagonal , matrix (chemical analysis) , estimation of covariance matrices , mathematical analysis , materials science , geometry , composite material
Summary. Many geophysical regression problems require the analysis of large (more than 10 4 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M ‐estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii , m , N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.