Darrell S. Kaufman, David P. Schneider, Nicholas P. McKay, Caspar M. Ammann, Raymond S. Bradley, Keith R. Briffa, Gifford H. Miller, Bette L. Otto-Bliesner, Jonthan T. Overpeck, and Bo M. Vinther (Science 9/4/2009) propose a reconstruction of Arctic summer land temperatures for the last 2000 years, using 23 diverse proxies. Decadal averages of each proxy are normalized to zero mean and unit variance relative to the period 980-1800 AD, when all 23 proxies are available. These are then are averaged, as available, to form a 2000-year composite. This composite is converted into temperature anomalies by comparison to the CRUTEM3 summer (JJA) temperature for land north of 60°N latitude, for 1860 – 2000 A.D.
Unfortunately, the paper’s calibration of the proxy composite is defective. The 23 proxies used include lake varve thicknesses, varve densities, varve and sediment organic material (OM), sediment biosilica, ice core 18O depletions, and tree ring widths, and are all from different locations. There is no a priori reason to expect these very diverse proxies to all have the same behavior with respect to temperature, even when normalized. Because not all the proxies are available in each decade, the composite does not have constant composition. As its composition changes, it essentially becomes a different index, which must be calibrated separately to temperature. The authors fail to do this, and hence the reconstruction is invalid.
In the online Supplementary Information, the authors give their calibration regression equation as
P = 2.079T + 0.826 (r2 = 0.79, p < 0.01, n = 14).
Presumably they then invert this equation to reconstruct T(t) as a function of P(t) during the reconstruction period 0-1860 AD. In itself, this corresponds to the univariate version of the “Classical Calibration Estimation” discussed by Brown (1982), and advocated on CA by reader UC. In general, it would be more efficient not to weight the individual normalized series equally, but there is no reason such an equally-weighted composite of standardized series cannot be calibrated univariately in this manner.
Nevertheless, there are three problems with the calibration as performed by Kaufman et al.: First, they estimate their calibration equation on a sub-composite index which includes only those 19 series that extend to 1980. Since the calibration equation is based on a composite that does not include 4 of the series, it is invalid to use it to calibrate the full 23 proxy series, or any sub-composite that includes any of these 4 series. The completely excluded series are #11 (the SFL4 sediment series from W. Greenland) and #19 – 21 (the 3 lake varve series from Finland, including the controversial Tiljander series #20). Since the Tiljander series was (validly) truncated at 1810 to avoid effects of human activities on the lake sediments, there is in fact no way to calibrate it or any composite containing it to post-1860 temperatures. Series #19 is likewise truncated at 1810.
The second problem is that even though the last decade in which the 19 series were observed together was the 1970s (12 decades), the authors use subsets of these 19 to fit their regression equation through 2000 (14 decades). The 13th decade in this regression in fact is based on the average of only 15 of the 19 series, and the 14th decade is based on only 12 of the 19 series. Since these are different composites, they do not necessarily obey the same regression equation, and it is invalid to include these points in the calibration of the 19-proxy sub-composite. (In fact, there were only 9 decades during the calibration period when the 19 series were all truly observed, because series 2 and 8 had a few internal missing observations that the authors interpolated. While interpolation might be an excusable expedient for the reconstruction, it is invalid for the calibration regression, since it does not add a true observation.)
The third problem is that the authors apply their calibration equation to reconstruct temperatures during decades when some, or even several, of the 19 calibration proxies were not available. Even if the reconstruction composite had been restricted to the 19 calibration proxies, and even if all 19 of these had been observed for all 14 decades of the calibration period, it would still be invalid to apply the calibration equation derived from all 19 to any subset of the 19. This in itself makes the reconstruction invalid before 970 A.D., when proxy #9 ceases to be observed. Prior to 460 A.D., only 13 of the 19 calibration proxies are available.
The irregularity of this data set does not make its calibration insurmountable. So long as the proxies were not “cherry picked” from a broader universe of proxies on the basis of their correlation with instrumental temperatures (a big “if” in the present instance), the following steps would be adequate to perform a valid calibration:
1. Regress the decadal averages of each proxy Pi(t) on the decadal averages T(t) of the instrumental temperature series, using all the observations for that proxy but excluding interpolations, to obtain a regression equation of the form
Pi(t) = ai(t) + bi(t) T(t) + ei(t),
with standard errors appropriately adjusted for serial correlation. Of course, proxies with less than 14 decadal observations will tend to have higher standard errors than those with all 14 observations, and the two that were truncated at 1810 cannot be used at all.
2. Compute the covariance matrix of the regression errors for the 21 remaining series, as follows: First, compute the variance of the errors in each regression. Then, for each pair of proxies, compute the correlation between their regression errors, using only the observations that are active for both proxies, and without recentering the errors. Finally, convert these correlations into covariances using the full sample variance for each of the errors. This matrix is not of full rank, but since it does not need to be inverted, this should not be a problem.
3. For each reconstruction period t, compute P(t) = mean(Pi(t)), a(t) = mean(ai), and b(t) = mean(bi), taking the means over the time-t specific subset of proxies, including interpolations if desired. Compute the variance and covariance of a(t) and b(t) and the variance of e(t) = mean(ei(t)) using the covariance matrix computed in step 2, and the assumption that the correlations of the coefficients from different regressions are equal to the correlations of their error terms. Generally, the smaller the active proxy set, the larger the standard errors of the coefficients will be.
4. Invert the t-specific equation
P(t) = a(t) + b(t) T(t) + e(t)
to obtain
T(t) = (P(t) – a(t) – e(t))/b(t).
Setting e(t) = 0 gives the point estimate
T*(t) = (P(t) – a(t) )/b(t).
5. Use the formula for the distribution of the ratio of two correlated normal random variables (D.V. Hinkley, Biometrika 1969, corr. 1970) to compute confidence intervals for T(t). This approach always leads to finite CI bounds, unlike the classical method of inverting upper and lower confidence bounds for P(t) as a function of T(t). It also can be given a Bayesian interpretation, in terms of a diffuse prior for T(t), provided Bayes’ Rule is invoked before the regression information is applied and not after.
Since four of the proxies are treering series that may well have been affected by CO2 fertilization in the last 60 years, CO2 should also be controlled for in the TR regressions, and appropriately entered in the reconstruction.
I have not yet attempted to perform these calibrations, but they should be straightforward.
(I have previously noted, in Comments 35 and 37 of the Kaufman and Upside-Down Mann thread, that the correction for serial correlation of regression residuals as described in the Kaufman SI is also incorrect. This is a separate, but also important problem.)
