Premium
Power and sample size calculations for the Wilcoxon–Mann–Whitney test in the presence of death‐censored observations
Author(s) -
Matsouaka Roland A.,
Betensky Rebecca A.
Publication year - 2014
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.6355
Subject(s) - wilcoxon signed rank test , mann–whitney u test , statistics , rank (graph theory) , log rank test , mathematics , clinical endpoint , sample size determination , time point , survival analysis , medicine , randomized controlled trial , combinatorics , philosophy , aesthetics
We consider a clinical trial of a potentially lethal disease in which patients are randomly assigned to two treatment groups and are followed for a fixed period of time; a continuous endpoint is measured at the end of follow‐up. For some patients; however, death (or severe disease progression) may preclude measurement of the endpoint. A statistical analysis that includes only patients with endpoint measurements may be biased. An alternative analysis includes all randomized patients, with rank scores assigned to the patients who are available for the endpoint measurement on the basis of the magnitude of their responses and with ‘worst‐rank’ scores assigned to those patients whose death precluded the measurement of the continuous endpoint. The worst‐rank scores are worse than all observed rank scores. The treatment effect is then evaluated using the Wilcoxon–Mann–Whitney test. In this paper, we derive closed‐form formulae for the power and sample size of the Wilcoxon–Mann–Whitney test when missing measurements of the continuous endpoints because of death are replaced by worst‐rank scores. We distinguish two approaches for assigning the worst‐rank scores. In the tied worst‐rank approach, all deaths are weighted equally, and the worst‐rank scores are set to a single value that is worse than all measured responses. In the untied worst‐rank approach, the worst‐rank scores further rank patients according to their time of death, so that an earlier death is considered worse than a later death, which in turn is worse than all measured responses. In addition, we propose four methods for the implementation of the sample size formulae for a trial with expected early death. We conduct Monte Carlo simulation studies to evaluate the accuracy of our power and sample size formulae and to compare the four sample size estimation methods. Copyright © 2014 John Wiley & Sons, Ltd.