A bounded influence regression estimator based on the statistics of the hat matrix | Zendy

Chave Alan D. | Zendy; Thomson David J. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A bounded influence regression estimator based on the statistics of the hat matrix

Author(s) -

Chave Alan D.,

Thomson David J.

Publication year - 2003

Publication title -

journal of the royal statistical society: series c (applied statistics)

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.205

H-Index - 72

eISSN - 1467-9876

pISSN - 0035-9254

DOI - 10.1111/1467-9876.00406

Subject(s) - estimator , mathematics , statistics , leverage (statistics) , outlier , quantile regression , quantile , bounded function , scatter matrix , robust regression , diagonal , matrix (chemical analysis) , estimation of covariance matrices , mathematical analysis , materials science , geometry , composite material

Summary. Many geophysical regression problems require the analysis of large (more than 10 4 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M ‐estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii , m , N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore