Premium
Computational rank‐based statistics
Author(s) -
McKean Joseph W.,
Terpstra Jeff T.,
Kloke John D.
Publication year - 2009
Publication title -
wiley interdisciplinary reviews: computational statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.693
H-Index - 38
eISSN - 1939-0068
pISSN - 1939-5108
DOI - 10.1002/wics.29
Subject(s) - inference , outlier , rank (graph theory) , computer science , algorithm , completeness (order theory) , mathematics , data mining , artificial intelligence , mathematical analysis , combinatorics
This review discusses two algorithms that can be used to compute rank‐based regression estimates. For completeness, a brief overview of rank‐based inference procedures in the context of a linear model is presented. The discussion includes geometry, estimation, inference, and diagnostics. In regard to computing the rank‐based estimates, we discuss two approaches. The first approach is based on an algebraic identity that allows one to compute the (Wilcoxon) estimates using a L 1 regression routine. The other approach is a Newton‐type algorithm. In addition, we discuss how rank‐based inference can be generalized to nonlinear and random effects models. Some simple examples using existing statistical software are also presented for the sake of illustration and comparison. Traditional least squares (LS) procedures offer the user an encompassing methodology for analyzing models, linear or nonlinear. These procedures are based on the simple premise of fitting the model by minimizing the Euclidean distance between the vector of responses and the model. Besides the fit, the LS procedures include diagnostics to check the quality of fit and an array of inference procedures including confidence intervals (regions) and tests of hypotheses. LS procedures, though, are not robust. One outlier can spoil the LS fit, its associated inference, and even its diagnostic procedures (i.e., methods which should detect the outliers). Rank‐based procedures also offer the user a complete methodology. The only essential change is to replace the Euclidean norm by another norm, so that the geometry remains the same. As with the LS procedures, these rank‐based procedures offer the user diagnostic tools to check the quality of fit and associated inference procedures. Further, in contrast to the LS procedures, they are robust to the effect of outliers. They are generalizations of simple nonparametric rank procedures such as the Wilcoxon one and two‐sample methods and they retain the high efficiency of these simple rank methods. Further, depending on the knowledge of the underlying error distribution, this rank‐based analysis can be optimized by the choice of the norm (scores). Weighted versions of the fit can obtain high (50%) breakdown. Copyright © 2009 John Wiley & Sons, Inc. This article is categorized under: Statistical and Graphical Methods of Data Analysis > Nonparametric Methods