Premium
Hierarchical testing of multiple endpoints in group‐sequential trials
Author(s) -
Glimm Ekkehard,
Maurer Willi,
Bretz Frank
Publication year - 2009
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.3748
Subject(s) - clinical endpoint , interim , statistics , interim analysis , benchmark (surveying) , primary (astronomy) , computer science , statistical significance , mathematics , clinical trial , medicine , physics , archaeology , geodesy , astronomy , history , geography
We consider the situation of testing hierarchically a (key) secondary endpoint in a group‐sequential clinical trial that is mainly driven by a primary endpoint. By ‘mainly driven’, we mean that the interim analyses are planned at points in time where a certain number of patients or events have accrued on the primary endpoint, and the trial will run either until statistical significance of the primary endpoint is achieved at one of the interim analyses or to the final analysis. We consider both the situation where the trial is stopped as soon as the primary endpoint is significant as well as the situation where it is continued after primary endpoint significance to further investigate the secondary endpoint. In addition, we investigate how to achieve strong control of the familywise error rate (FWER) at a pre‐specified significance level α for both the primary and the secondary hypotheses. We systematically explore various multiplicity adjustment methods. Starting point is a naive strategy of testing the secondary endpoint at level α whenever the primary endpoint is significant. Hung et al . ( J. Biopharm. Stat. 2007; 17 :1201–1210) have already shown that this naive strategy does not maintain the FWER at level α. We derive a sharp upper bound for the rejection probability of the secondary endpoint in the naive strategy. This suggests a number of multiple test strategies and also provides a benchmark for deciding whether a method is conservative or might be improved while maintaining the FWER at α. We use a numerical example based on a real case study to illustrate the results of different hierarchical test strategies. Copyright © 2009 John Wiley & Sons, Ltd.