z-logo
Premium
Explanation of Variability and Removal of Confounding Factors from Data through Optimal Transport
Author(s) -
Tabak Esteban G.,
Trigila Giulio
Publication year - 2018
Publication title -
communications on pure and applied mathematics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.12
H-Index - 115
eISSN - 1097-0312
pISSN - 0010-3640
DOI - 10.1002/cpa.21706
Subject(s) - mathematics , principal component analysis , conditional probability distribution , sampling (signal processing) , series (stratigraphy) , statistics , simple (philosophy) , conditional probability , sample (material) , econometrics , algorithm , computer science , paleontology , philosophy , chemistry , filter (signal processing) , epistemology , chromatography , computer vision , biology
A methodology based on the theory of optimal transport is developed to attribute variability in data sets to known and unknown factors and to remove such attributable components of the variability from the data. Denoting by x the quantities of interest and by z the explanatory factors, the procedure transforms x into filtered variables y through a z ‐dependent map, so that the conditional probability distributions ρ ( x | z ) are pushed forward into a target distribution μ(y) , independent of z . Among all maps and target distributions that achieve this goal, the procedure selects the one that minimally distorts the original data: the barycenter of the ρ ( x | z ). Connections are found to unsupervised learning and to fundamental problems in statistics such as conditional density estimation and sampling. Particularly simple instances of the methodology are shown to be equivalent to k ‐means and principal component analysis. An application is shown to a time series of ground temperature hourly data across the United States.© 2017 Wiley Periodicals, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here