Open Access
Optimal Estimation of the Climatological Mean
Author(s) -
Balachandrudu Narapusetty,
Timothy DelSole,
Michael K. Tippett
Publication year - 2009
Publication title -
journal of climate
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.315
H-Index - 287
eISSN - 1520-0442
pISSN - 0894-8755
DOI - 10.1175/2009jcli2944.1
Subject(s) - overfitting , context (archaeology) , simple (philosophy) , mathematics , diurnal cycle , mean squared error , statistics , series (stratigraphy) , climatology , computer science , geology , paleontology , philosophy , epistemology , machine learning , artificial neural network
This paper shows theoretically and with examples that climatological means derived from spectral methods predict independent data with less error than climatological means derived from simple averaging. Herein, “spectral methods” indicates a least squares fit to a sum of a small number of sines and cosines that are periodic on annual or diurnal periods, and “simple averaging” refers to mean averages computed while holding the phase of the annual or diurnal cycle constant. The fact that spectral methods are superior to simple averaging can be understood as a straightforward consequence of overfitting, provided that one recognizes that simple averaging is a special case of the spectral method. To illustrate these results, the two methods are compared in the context of estimating the climatological mean of sea surface temperature (SST). Cross-validation experiments indicate that about four harmonics of the annual cycle are adequate, which requires estimation of nine independent parameters. In contrast, simple averaging of daily SST requires estimation of 366 parameters—one for each day of the year, which is a factor of 40 more parameters. Consistent with the greater number of parameters, simple averaging poorly predicts samples that were not included in the estimation of the climatological mean, compared to the spectral method. In addition to being more accurate, the spectral method also accommodates leap years and missing data simply, results in a greater degree of data compression, and automatically produces smooth time series.