Premium
Summary goodness‐of‐fit statistics for binary generalized linear models with noncanonical link functions
Author(s) -
Canary Jana D.,
Blizzard Leigh,
Barry Ronald P.,
Hosmer David W.,
Quinn Stephen J.
Publication year - 2016
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.201400079
Subject(s) - statistics , logit , goodness of fit , mathematics , probit , probit model , logistic regression , generalized linear model , statistic , econometrics , covariate
Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness‐of‐fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, ( T G ), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer–Lemeshow ( H L ) and Pigeon–Heyse ( J 2 ) statistics can be applied directly. In a simulation study, T G , H L , and J 2 were used to evaluate the fit of probit, log–log, complementary log–log, and log models, all calculated with a common grouping method. The T G statistic consistently maintained Type I error rates, while those of H L and J 2 were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, T G had more power than H L or J 2 .