Premium
Complexity‐based robust hydrologic prediction
Author(s) -
Pande Saket,
McKee Mac,
Bastidas Luis A.
Publication year - 2009
Publication title -
water resources research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.863
H-Index - 217
eISSN - 1944-7973
pISSN - 0043-1397
DOI - 10.1029/2008wr007524
Subject(s) - computer science , probabilistic logic , variance (accounting) , generalization , ensemble forecasting , ensemble learning , sample (material) , machine learning , data mining , dimension (graph theory) , sample size determination , process (computing) , artificial intelligence , mathematics , statistics , mathematical analysis , chemistry , accounting , chromatography , pure mathematics , business , operating system
Water resource management requires robust assessment of the consequences of future states of the resource, and, when dependent on prediction models, it requires assessment of the uncertainties associated with those predictions. Ensemble prediction/forecast systems have been extensively used to address such issues and seek to provide a collection of predictions, via a collection of parameters, with intent to bracket future observations. However, such methods do not have well‐established finite‐sample properties and generally require large samples to additionally determine better performing predictions, for example, in nonlinear probabilistic ensemble methods. We here propose a different paradigm, based on Vapnik‐Chervonenkis (VC) generalization theory, for robust parameter selection and prediction. It is based on a concept of complexity (that is data‐independent) that relates finite sample performance of a model to its performance when a large sample of the same underlying process is available. We employ a nearest neighbor method as the underlying prediction model, introduce a procedure to compute its VC dimension, and test how the two paradigms handle uncertainty in one step ahead daily streamflow prediction for three basins. In both paradigms, the predictions become more efficient and less biased with increasing sample size. However, the complexity‐based paradigm has a better bias‐variance tradeoff property for small sample sizes. The uncertainty bounds on predictions resulting from ensemble methods behave in an inconsistent manner for smaller basins, suggesting the need for further postprocessing of ensemble members and uncertainty surrounding them before using them in modeling uncertainty estimation. Finally, complexity‐based predictions appear to mimic the complexity of the underlying processes via input dimensionality selection of the nearest neighbor model.