Premium
The Utility of Information Flow in Formulating Discharge Forecast Models: A Case Study From an Arid Snow‐Dominated Catchment
Author(s) -
Tennant Christopher,
Larsen Laurel,
Bellugi Dino,
Moges Edom,
Zhang Liang,
Ma Hongxu
Publication year - 2020
Publication title -
water resources research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.863
H-Index - 217
eISSN - 1944-7973
pISSN - 0043-1397
DOI - 10.1029/2019wr024908
Subject(s) - streamflow , baseflow , snowpack , environmental science , snowmelt , snow , discharge , hydrological modelling , precipitation , lag , climatology , hydrology (agriculture) , drainage basin , meteorology , computer science , geography , geology , computer network , cartography , geotechnical engineering
Streamflow forecasts often perform poorly because of improper representation of hydrologic response timescales in underlying models. Here, we use transfer entropy (TE), which measures information flow between variables, to identify dominant drivers of discharge and their timescales using sensor data from the Dry Creek Experimental Watershed, ID, USA. Consistent with previous mechanistic studies, TE revealed that snowpack accumulation and partitioning into melt, recharge, and evaporative loss dominated discharge patterns and that snow‐sourced baseflow reduced the greatest amount of uncertainty in discharge. We hypothesized that machine learning models (MLMs) specified in accordance with the dominant lag timescales, identified via TE, would outperform timescale‐agnostic models. However, while lagged‐variable random forest regressions captured the dominant process—seasonal snowmelt—they ultimately did not perform as well as the unlagged models, provided those models were specified with input data aggregated over a range of timescales. Unlagged models, not constrained by timescales of the dominant processes, more effectively represented variable interactions (e.g., rain‐on‐snow events) playing a critical role in translating precipitation into streamflow over long, intermediate, and short timescales. Meanwhile, long short‐term memory (LSTM) models were effective in internally identifying the key lag and aggregation scales for predicting discharge. Parsimonious specification of LSTM models, using only daily unlagged precipitation and temperature data, produced the highest performing predictions. Our findings suggest that TE can identify dominant streamflow controls and the relative importance of different mechanisms of streamflow generation, useful for establishing process baselines and fingerprinting watersheds. However, restricting MLMs based on dominant timescales undercuts their skill at learning these timescales internally.