
On the systematic reduction of data complexity in multimodel atmospheric dispersion ensemble modeling
Author(s) -
Riccio A.,
Ciaramella A.,
Giunta G.,
Galmarini S.,
Solazzo E.,
Potempski S.
Publication year - 2012
Publication title -
journal of geophysical research: atmospheres
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.67
H-Index - 298
eISSN - 2156-2202
pISSN - 0148-0227
DOI - 10.1029/2011jd016503
Subject(s) - cluster analysis , computer science , independence (probability theory) , mutual information , data mining , context (archaeology) , model selection , reduction (mathematics) , set (abstract data type) , data set , measure (data warehouse) , selection (genetic algorithm) , distance correlation , data reduction , algorithm , artificial intelligence , statistics , mathematics , paleontology , geometry , biology , random variable , programming language
The aim of this work is to explore the effectiveness of theoretical information approaches for the reduction of data complexity in multimodel ensemble systems. We first exploit a weak form of independence, i.e. uncorrelation, as a mechanism for detecting linear relationships. Then, stronger and more general forms of independence measure, such as mutual information, are used to investigate dependence structures for model selection. A distance matrix, measuring the interdependence between data, is derived for the investigated measures, with the scope of clustering correlated/dependent models together. Redundant information is discarded by selecting a few representative models from each cluster. We apply the clustering analysis in the context of atmospheric dispersion modeling, by using the ETEX‐1 data set. We show how the selection of a small subset of models, according to uncorrelation or mutual information distance criteria, usually suffices to achieve a statistical performance comparable to, or even better than, that achieved from the whole ensemble data set, thus providing a simpler description of ensemble results without sacrificing accuracy.