Premium
Issues of experimental design for comparing the performance of hydrologic models
Author(s) -
Clarke Robin T.
Publication year - 2008
Publication title -
water resources research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.863
H-Index - 217
eISSN - 1944-7973
pISSN - 0043-1397
DOI - 10.1029/2007wr005927
Subject(s) - comparability , replicate , calibration , range (aeronautics) , hydrological modelling , context (archaeology) , computer science , environmental science , baseline (sea) , population , econometrics , statistics , climatology , mathematics , geography , geology , engineering , oceanography , demography , archaeology , combinatorics , sociology , aerospace engineering
Research to compare the performance of precipitation‐runoff models (and indeed models of other hydrologic and geophysical processes) is increasingly reported in the literature. A characteristic of almost all such intercomparisons at present is that each model is tested only by the research group (the “operator”) that has developed it, so that differences between models are confounded with differences between operators. It has long been recognized that each model should be tested using data from a range of watersheds, giving replication of results under different conditions of climate, vegetation, geology, and other factors. This paper argues that in addition, models must be tested by more than one operator and that the other necessary feature of good experimental practice (namely, the use of randomization procedures where models are to be tested in sequence) is also needed in model intercomparison studies. This removes possible bias, inadvertent or otherwise, and allows the uncertainties in estimates of model performance to be calculated. In the context of intercomparison experiments, the paper discusses a number of issues: (1) experimental design as a means of eliminating operator bias; (2) the comparability between models for which different numbers of parameters must be estimated; (3) the consequence for estimates of uncertainty (as measured by standard errors of mean measures of model performance) of assuming that the watersheds used in an intercomparison experiment are sampled from a wider population of watersheds; (4) the “representativity” of the hydrologic records used for calibration and validation of models; (5) whether the periods of record used in calibration/validation from different watersheds should be synchronized for each model run; and (6) procedures for estimating how well models would perform on ungaged basins.