z-logo
Premium
Large‐scale regression‐based pattern discovery: The example of screening the WHO global drug safety database
Author(s) -
Caster Ola,
Norén G. Niklas,
Madigan David,
Bate Andrew
Publication year - 2010
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.10078
Subject(s) - computer science , false positive paradox , contingency table , covariate , data mining , logistic regression , scale (ratio) , regression , regression analysis , false positives and false negatives , statistics , econometrics , artificial intelligence , machine learning , mathematics , geography , cartography
Most measures of interestingness for patterns of co‐occurring events are based on data projections onto contingency tables for the events of primary interest. As an alternative, this article presents the first implementation of shrinkage logistic regression for large‐scale pattern discovery, with an evaluation of its usefulness in real‐world binary transaction data. Regression accounts for the impact of other covariates that may confound or otherwise distort associations. The application considered is international adverse drug reaction (ADR) surveillance, in which large collections of reports on suspected ADRs are screened for interesting reporting patterns worthy of clinical follow‐up. Our results show that regression‐based pattern discovery does offer practical advantages. Specifically it can eliminate false positives and false negatives due to other covariates. Furthermore, it identifies some established drug safety issues earlier than a measure based on contingency tables. While regression offers clear conceptual advantages, our results suggest that methods based on contingency tables will continue to play a key role in ADR surveillance, for two reasons: the failure of regression to identify some established drug safety concerns as early as the currently used measures, and the relative lack of transparency of the procedure to estimate the regression coefficients. This suggests shrinkage regression should be used in parallel to existing measures of interestingness in ADR surveillance and other large‐scale pattern discovery applications. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 197‐208, 2010

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here