Premium
Using Shannon's entropy to sample heterogeneous and high‐dimensional atmospheric datasets
Author(s) -
Paul M.,
Aires F.
Publication year - 2014
Publication title -
quarterly journal of the royal meteorological society
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.744
H-Index - 143
eISSN - 1477-870X
pISSN - 0035-9009
DOI - 10.1002/qj.2367
Subject(s) - sampling (signal processing) , entropy (arrow of time) , troposphere , environmental science , computer science , data mining , remote sensing , statistics , meteorology , mathematics , geology , geography , physics , filter (signal processing) , quantum mechanics , computer vision
The construction of diverse and synthetic datasets of atmospheric situations, used as first guesses or training bases for remote‐sensing algorithms, is still a challenge. Numerical constraints require the use of datasets with a limited number of representative situations, but keeping, as much as possible, the full diversity observed in nature. This study presents an innovative sampling method that allows extraction of a new, more limited, dataset from a large database of atmospheric situations. One major issue of such sampling concerns the heterogeneity of the input space variables: different units and ranges of temperatures and specific humidities, for instance, or locations from the lower troposphere to the higher stratosphere, can hardly be compared. We illustrate the fact that sampling using only one variable type is not optimal, since erroneous features can be observed in the other variables not used for the sampling. The use of Shannon's entropy can help to develop a sampling technique able to deal with very heterogeneous variables. A dataset of 10 000 situations is built from EUMETSAT satellite atmospheric retrievals: it includes temperature and water‐vapour profiles, four integrated ozone layers and surface temperature. The sampling increases the entropy of the original dataset from 22 to 28 (about 20% increase).