Quantcast
Channel: Climate Audit » mcculloch
Viewing all articles
Browse latest Browse all 3

Spline Smoothing

$
0
0

The 2009 Climate Dynamics paper “Unprecedented low twentieth century winter sea ice extent in the Western Nordic Seas since A.D. 1200″ by M. Macias Fauria, A. Grinsted, et al. discussed already on the thread Svalbard’s Lost Decades pre-smooths its data with a 5-year cubic spline before running its regressions.

There’s been a lot of discussion of smoothing here on CA, especially as it relates to endpoints. However, splines remain something of a novelty here.

A cubic spline is simply a piecewise cubic function, with discontinuities in its third derivative at selected “knot points,” but continuous lower order derivatives. This curve happens to approximate the shapes taken by a mechanical spline, a flexible drafting tool, and minimizes the energy required to force the mechanical spline through selected values at the knot points. The “5-year” spline used by Macias Fauria et al presumably has knot points spaced 5 years apart.

While splines can generate nice smooth pictures, they have no magic statistical properties, and have some special problems of their own. Before performing statistical analysis on spline smoothed data, William Briggs’ article, “Do not smooth time series, you hockey puck!” should be required reading. His admonition,

Unless the data is measured with error, you never, ever, for no reason, under no threat, SMOOTH the series! And if for some bizarre reason you do smooth it, you absolutely on pain of death do NOT use the smoothed series as input for other analyses!

is as valid as ever.

A function y(t) that is a spline function of time t with knots at t = k1, k2, … is simply a linear combination of the functions 1, t, t^2, t^3, max(t-k1,0)^3, max(t-k2,0)^3, … Given data on n+1 values y(0), y(1), … y(n), the coefficients on these functions may be found by a least squares regression. The smoothed values z(t) are just the predicted values from this regression, and these in turn are simply weighted averages of the y(t) observations.

Figure 1 below shows the weight each z(t) places on each y(t’) when n = 100 so that the sample runs from 0 to 100, with knots every 5 years at k1 = 5, k2 = 10, etc:
Spline weights

Figure 1


Figure 2 below shows the weights for z(50). Since t = 50 is centrally located between 0 and 100, these weights are precisely symmetrical. However, unlike a simple rectangular 5-year centered filter, the weights extend far in both directions, so that spline smoothing can induce serial correlation at leads and lags far in excess of 5.
Spline Weights, t = 50
Figure 2
Figure 3 below shows the frequency response function for the weights of Figure 2. The amplitude is near 0 for cycles with periods under 5.7 years and is .50 for 9.0 year cycles. Unlike the frequency response functions we saw in the discussion of Rahmstorf’s smoother, comments #34, 37, 178 and 203, however, there is actually magnification of some frequencies above unity, reaching a peak of 1.22 at 11.2 years.
Spline Response, t = 50
Figure 3
Figure 1 shows that the weights when t is between knot points are somewhat flatter than when t is right at a knot point. The frequency response between knots is therefore somewhat longer than at knot points, so that there is no unambiguous frequency response, even far from the end points.

Figure 1 also shows that the weights for z(t) behave very differently as t approaches the end points. Figure 4 below shows these weights for the very last point, t = 100. Clearly they are highly skewed.
Spline
Figure 4
Because of the skewness of the weights for t = 100, the frequency response is a complex-valued function, shown in Figure 5 below. The overall frequency response is given by the magnitude of this function, shown in red. The magnitude is above 0.6 for all periods above the Nyquist period of 2, and amplifies by a factor of about 1.4 at 6-8 year periods, a very strange frequency response indeed. Furthermore, when the complex part is non-zero, there is also a phase shift.
Spline
Figure 5
Although a cubic spline produces values for z(t) for all values of t, the ones near the end of the observed sample are particularly noisy and erratic unless some additional restriction or restrictions (like the zero second derivative of a “natural” spline) are imposed.



Viewing all articles
Browse latest Browse all 3

Trending Articles