Premium
Toward Data‐Driven Generation and Evaluation of Model Structure for Integrated Representations of Human Behavior in Water Resources Systems
Author(s) -
Ekblad Liam,
Herman Jonathan D.
Publication year - 2021
Publication title -
water resources research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.863
H-Index - 217
eISSN - 1944-7973
pISSN - 0043-1397
DOI - 10.1029/2020wr028148
Subject(s) - generalization , computer science , set (abstract data type) , inference , machine learning , data mining , artificial intelligence , task (project management) , genetic programming , mathematics , programming language , management , economics , mathematical analysis
Simulations of human behavior in water resources systems are challenged by uncertainty in model structure and parameters. The increasing availability of observations describing these systems provides the opportunity to infer a set of plausible model structures using data‐driven approaches. This study develops a three‐phase approach to the inference of model structures and parameterizations from data: problem definition, model generation, and model evaluation, illustrated on a case study of land use decisions in the Tulare Basin, California. We encode the generalized decision problem as an arbitrary mapping from a high‐dimensional data space to the action of interest and use multiobjective genetic programming to search over a family of functions that perform this mapping for both regression and classification tasks. To facilitate the discovery of models that are both realistic and interpretable, the algorithm selects model structures based on multiobjective optimization of (1) their performance on a training set and (2) complexity, measured by the number of variables, constants, and operations composing the model. After training, optimal model structures are further evaluated according to their ability to generalize to held‐out test data and clustered based on their performance, complexity, and generalization properties. Finally, we diagnose the causes of good and bad generalization by performing sensitivity analysis across model inputs and within model clusters. This study serves as a template to inform and automate the problem‐dependent task of constructing robust data‐driven model structures to describe human behavior in water resources systems.