Open Access
A Framework for the Cross‐Validation of Categorical Geostatistical Simulations
Author(s) -
Juda Przemysław,
Renard Philippe,
Straubhaar Julien
Publication year - 2020
Publication title -
earth and space science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.843
H-Index - 23
ISSN - 2333-5084
DOI - 10.1029/2020ea001152
Subject(s) - categorical variable , cross validation , geostatistics , data mining , computer science , ranking (information retrieval) , set (abstract data type) , statistics , mathematics , artificial intelligence , machine learning , spatial variability , programming language
Abstract The mapping of subsurface parameters and the quantification of spatial uncertainty requires selecting adequate models and their parameters. Cross‐validation techniques have been widely used for geostatistical model selection for continuous variables, but the situation is different for categorical variables. In these cases, cross‐validation is seldom applied, and there is no clear consensus on which method to employ. Therefore, this paper proposes a systematic framework for the cross‐validation of geostatistical simulations of categorical variables such as geological facies. The method is based on K‐fold cross‐validation combined with a proper scoring rule. It can be applied whenever an observation data set is available. At each cross‐validation iteration, the training set becomes conditioning data for the tested geostatistical model, and the ensemble of simulations is compared to true values. The proposed framework is generic. Its application is illustrated with two examples using multiple‐point statistics simulations. In the first test case, the aim is to identify a training image from a given data set. In the second test case, the aim is to identify the parameters in a situation including nonstationarity for a coastal alluvial aquifer in the south of France. Cross‐validation scores are used as metrics of model performance and quadratic scoring rule, zero‐one score, and balanced linear score are compared. The study shows that the proposed fivefold stratified cross‐validation with the quadratic scoring rule allows ranking the geostatistical models and helps to identify the proper parameters.