z-logo
Premium
On the necessity and design of studies comparing statistical methods
Author(s) -
Boulesteix AnneLaure,
Binder Harald,
Abrahamowicz Michal,
Sauerbrei Willi
Publication year - 2018
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.201700129
Subject(s) - biostatistics , library science , epidemiology , medical statistics , medicine , computer science , mathematics , statistics
In data analysis sciences in general and in biometrical research particularly, there are strong incentives for presenting work that entails new methods. Many journals require authors to propose new methods as a prerequisite for publication, as this is the most straightforward way to claim the necessary novelty. The development of new methods is also factually often a sine qua non condition to be recruited as a faculty member or to obtain personnel funding from a methods-oriented research agency, not least because it noticeably increases the chance to get published as outlined above. Thus, in statistical research and related methodology-oriented fields such as machine learning or bioinformatics, the well-known adage “publish or perish” could be translated into “propose new methods or perish.” Such a research paradigm is not favorable for studies that aim at meaningfully comparing alternative existing methods or, more generally, studies assessing the behavior and properties of existing methods. Yet, given the exponential increase in the number and complexity of new statistical methods being published every year, the end users are often at a loss regarding what are the “optimal” or even “appropriate” methods to answer the research question of interest given a particular data structure. It becomes more and more difficult to get an overview of existing methods, not to mention the overview of their respective performances in different settings (Sauerbrei, Abrahamowicz, Altman, Le Cessie, & Carpenter, 2014). Moreover, it is well known that studies comparing a suggested new method to existing methods may be (strongly) biased in favor of the new method. This is a consequence of various factors starting with the authors’ better expertise on the new method compared to the competing methods. Another factor is the combination of publication pressure (publish or perish) and publication bias—in the sense that a new method performing worse than existing ones has (severe) difficulties to get published (Boulesteix, Stierle, & Hapfelmeier, 2015). This may lead to simulation designs that might be—intentionally or unintentionally— biased. Note that not only empirical evaluations but also theoretical properties suggesting the superiority of a method under particular assumptions may be in principle potentially affected by this kind of bias. Deriving theoretical results for statistical approaches relevant in practice is extremely difficult and possible only under strong assumptions (Picard & Cook, 1984). We speculate that authors assessing the theoretical properties of their new method tend to make assumptions that are rather favorable for the new method—also a form of bias. In contrast, neutral comparison studies, as defined by Boulesteix, Wilson, and Hapfelmeier (2017a), are dedicated to the comparison itself: they do not aim to demonstrate the superiority of a particular method and are thus not designed in a way that may increase the probability to observe incorrectly this superiority. Furthermore, they involve authors who are, as a collective, approximately equally competent on all considered methods. Neutral comparison studies can be thus considered as unbiased. Yet, in practice, such neutral comparison studies may be very time consuming and difficult to both organize and perform. The need to ensure “equal competence” on all methods being compared may exclude some more complex (but perhaps more suitable) approaches or require a close collaboration among many experts. According to their official scope, most high-ranking statistical journals mainly focus on the development of new methods and on innovative applications, while comparison studies are not mentioned. These reasons, combined with the difficulties of conducting neutral comparison studies as outlined above, may explain the relative paucity of papers focusing on the comparison of existing methods. Most papers published in statistical journals suggest new methods (this term is used here; it includes “relevant” modifications of existing methods). Many of these new methods are not extensively compared with other methods by other researchers than their developers, except perhaps in a later paper, by the same or other authors, often aiming to demonstrate that the new approach is superior. For many (if not most) data analysis problems, there is no lack of available methods and no need for new methods. In fact, the multiplicity of possible data analysis approaches is even an issue on its own as recently illustrated by Silberzahn and Uhlmann (2015). Whereas such “embarras du choix,” related to the “multiplicity of perspectives”—including but not limited to model selection criteria—described by Gelman and Hennig (2017), is not bad per se and one should not attempt to eliminate it by formulating strict guidelines, it is not clear how one should deal with multiple approaches in practice. It is principally recommended to apply several analysis approaches to the data, but there is no consensus on how the multiple results should be reported. Moreover, the possibility of obtaining different results with different approaches raises concerns about “fishing for

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here