Re: ~~Juguemos Bilingüe~~Trend estimation
When a series of measurements of a process is treated as a time series, trend estimation is the application of statistical techniques to make and justify statements about trends in the data. Assuming the underlying process is a physical system that is incompletely understood, one may thereby construct a model, independent of anything known about the physics of the process, to explain the behaviour of the measurement. In particular, one may wish to know if the measurements exhibit an increasing or decreasing trend, that can be statistically distinguished from random behaviour. For example, take daily average temperatures at a given location, from winter to summer; or the global temperature series over the last 100 years.
Particularly in that latter case, issues of homogeneity (is the series equally reliable throughout its length?) are important. For the moment we shall simplify the discussion and neglect those points. This article does not attempt a full mathematical treatment, merely an exposition.
Fitting a trend: least-squares
Given a set of data, and the desire to produce some kind of "model" of that data (model, in this case, meaning a function fitted through the data) there are a variety of functions that can be chosen for the fit. But if there is no prior understanding of the data, the simplest function to fit is a straight line and thus this is the "default".
Once it has been decided to fit a straight line, there are various ways to do so, but the most usual choice is a the least-squares fit, equivalent to minimisation of the L2 norm. See least squares.
Thus, given a set of data points xi, and data values yi, one chooses a and b so that
\sum_i \left\{[(ax_i + b) - y_i]^2\right\}
is minimised. This can always be done in closed form since this is a case of simple linear regression.
For the rest of this article, "trend" will mean the least squares trend, since that is what is meant in 99% of cases everywhere else.
Now an estimated trend has been computed. But is it significant? And what is meant by significant?
Trends in random data
Before we can consider trends in real data, we need to understand trends in random data.
Red shaded values are greater than 99% of the rest; blue, 95%; green, 90%. In this case, the V values discussed in the text for (one-sided) 95% confidence is seen to be 0.2.
If we take a series which is known to be random – fair dice falls; or computer-generated random numbers – and fit a trend line through the data, the chances of a truly zero trend are negligible. But we would probably expect the trend to be "small". If we take a series with a given degree of noise, and a given length (say, 100 points), and generate a large number of such series (say, 100,000 series), we can then calculate the trends from these 100,000 series, and empirically establish a distribution of trends that are to be expected from such random data – see diagram. Such a distribution will be normal (central limit theorem except in pathological cases, since – in a slightly non-obvious way of thinking about it – the trend is a linear combination of the yi) and, if the series genuinely is random, centered on zero. We may now establish a level of statistical certainty, S, desired – 95% confidence is typical; 99% would be stricter, 90% rather looser – and say: what value, V, do we have to choose so that S% of trends are within V? (complication: we may be interested in positive and negative trends – 2-tailed – or may have prior knowledge that only positive, or only negative, trends are of interest).
In the above discussion the distribution of trends was calculated empirically, from a large number of trials. In simple cases (normally distributed random noise being a classic) the distribution of trends can be calculated exactly.
Suppose we then take another series with approximately the same variance properties as our random series. We do not know in advance whether it "really" has a trend in it, so we calculate the trend, T, and discover that it is less than V. Then we may say that, at degree of certainty S, any trend in the data cannot be distinguished from random noise.
However, note that whatever value of S we choose, then a given fraction, 1 − S, of truly random series will be declared (falsely, by construction) to have a significant trend. Conversely, a certain fraction of series that "really" have a trend will not be declared to have a trend.
[edit] Data as trend plus noise
To analyse a (time) series of data, we assume that it may be represented as trend plus noise:
x_i = at_i + b + e_i\,
where a and b are (usually unknown) constants and the e's are independent randomly distributed "errors". Unless something special is known about the e's, they will be assumed to have a normal distribution. It is simplest if the e's all have the same distribution, but if not (if some have higher variance, meaning that those data points are effectively less certain) then this can be taken into account during the least squares fitting, by weighting each point by the inverse of the variance of that point.
In most cases, where only a single time series exists to be analysed, the variance of the e's is estimated by fitting a trend, thus allowing at + b to be removed and leaving the e's as residuals, and calculating the variance of the e's from the residuals — this is often the only way of estimating the variance of the e's.
One particular special case of great interest, the (global) temperature time series, is known not to be homogeneous in time: apart from anything else, the number of weather observations has (generally) increased with time, and thus the error associated with estimating the global temperature from a limited set of observations has decreased with time. In fitting a trend to this data, this can be taken into account, as described above.
Once we know the "noise" of the series, we can then assess the significance of the trend by making the null hypothesis that the trend, a, is not significantly different from 0. From the above discussion of trends in random data with known variance, we know the distribution of trends to be expected from random (trendless) data. If the calculated trend, a, is larger than the value, V, then the trend is deemed significantly differentiable from zero at significance level S.
Noisy time series, and an example
It is harder to see a trend in a noisy time series. For example, if the true series is 0, 1, 2, 3 all plus some independent normally distributed "noise" e of standard deviation E, and we have a sample series of length 50, then if E = 0.1 the trend will be obvious; if E = 100 the trend will probably be visible; but if E = 10000 the trend will be buried in the noise.
If we consider a concrete example, the global surface temperature record of the past 140 years as presented by the IPCC: [1], then the interannual variation is about 0.2°C and the trend about 0.6°C over 140 years, with 95% confidence limits of 0.2°C (by coincidence, about the same value as the interannual variation). Hence the trend is statistically different from 0.
[edit] Goodness of fit (R-squared) and trend
Illustration of the variation of r2 with filtering whilst fit remains the same
The least-squares fitting process produces a value – r-squared (r2) – which is the square of the residuals of the data after the fit. It says what fraction of the variance of the data is explained by the fitted trend line. It does not relate to the significance of the trend line – see graph. A noisy series can have a very low r2 value but a very high significance of fit. Often, filtering a series increases r2 whilst making little difference to the fitted trend or significance.
Real data need more complicated models
Thus far the data have been assumed to consist of the trend plus noise, with the noise at each data point being independent and identically-distributed random variables and to have a normal distribution. Real data (for example climate data) may not fulfill these criteria. This is important, as it makes an enormous difference to the ease with which the statistics can be analyzed so as to extract maximum information from the data-series. The use of least-squares estimation of the trend is valid, but might be improved. Statistical inferences (tests for the presence of trend, confidence intervals for the trend, etc.) are invalid unless departures from the standard assumptions are properly accounted for.
Dependence: autocorrelated time series might be modeled using autoregressive moving average models.
Non-constant variance: in the simplest cases weighted least squares might be used.
Non-normal distribution for errors: in the simplest cases a generalised linear model might be applicable...
When a series of measurements of a process is treated as a time series, trend estimation is the application of statistical techniques to make and justify statements about trends in the data. Assuming the underlying process is a physical system that is incompletely understood, one may thereby construct a model, independent of anything known about the physics of the process, to explain the behaviour of the measurement. In particular, one may wish to know if the measurements exhibit an increasing or decreasing trend, that can be statistically distinguished from random behaviour. For example, take daily average temperatures at a given location, from winter to summer; or the global temperature series over the last 100 years.
Particularly in that latter case, issues of homogeneity (is the series equally reliable throughout its length?) are important. For the moment we shall simplify the discussion and neglect those points. This article does not attempt a full mathematical treatment, merely an exposition.
Fitting a trend: least-squares
Given a set of data, and the desire to produce some kind of "model" of that data (model, in this case, meaning a function fitted through the data) there are a variety of functions that can be chosen for the fit. But if there is no prior understanding of the data, the simplest function to fit is a straight line and thus this is the "default".
Once it has been decided to fit a straight line, there are various ways to do so, but the most usual choice is a the least-squares fit, equivalent to minimisation of the L2 norm. See least squares.
Thus, given a set of data points xi, and data values yi, one chooses a and b so that
\sum_i \left\{[(ax_i + b) - y_i]^2\right\}
is minimised. This can always be done in closed form since this is a case of simple linear regression.
For the rest of this article, "trend" will mean the least squares trend, since that is what is meant in 99% of cases everywhere else.
Now an estimated trend has been computed. But is it significant? And what is meant by significant?
Trends in random data
Before we can consider trends in real data, we need to understand trends in random data.
Red shaded values are greater than 99% of the rest; blue, 95%; green, 90%. In this case, the V values discussed in the text for (one-sided) 95% confidence is seen to be 0.2.
If we take a series which is known to be random – fair dice falls; or computer-generated random numbers – and fit a trend line through the data, the chances of a truly zero trend are negligible. But we would probably expect the trend to be "small". If we take a series with a given degree of noise, and a given length (say, 100 points), and generate a large number of such series (say, 100,000 series), we can then calculate the trends from these 100,000 series, and empirically establish a distribution of trends that are to be expected from such random data – see diagram. Such a distribution will be normal (central limit theorem except in pathological cases, since – in a slightly non-obvious way of thinking about it – the trend is a linear combination of the yi) and, if the series genuinely is random, centered on zero. We may now establish a level of statistical certainty, S, desired – 95% confidence is typical; 99% would be stricter, 90% rather looser – and say: what value, V, do we have to choose so that S% of trends are within V? (complication: we may be interested in positive and negative trends – 2-tailed – or may have prior knowledge that only positive, or only negative, trends are of interest).
In the above discussion the distribution of trends was calculated empirically, from a large number of trials. In simple cases (normally distributed random noise being a classic) the distribution of trends can be calculated exactly.
Suppose we then take another series with approximately the same variance properties as our random series. We do not know in advance whether it "really" has a trend in it, so we calculate the trend, T, and discover that it is less than V. Then we may say that, at degree of certainty S, any trend in the data cannot be distinguished from random noise.
However, note that whatever value of S we choose, then a given fraction, 1 − S, of truly random series will be declared (falsely, by construction) to have a significant trend. Conversely, a certain fraction of series that "really" have a trend will not be declared to have a trend.
[edit] Data as trend plus noise
To analyse a (time) series of data, we assume that it may be represented as trend plus noise:
x_i = at_i + b + e_i\,
where a and b are (usually unknown) constants and the e's are independent randomly distributed "errors". Unless something special is known about the e's, they will be assumed to have a normal distribution. It is simplest if the e's all have the same distribution, but if not (if some have higher variance, meaning that those data points are effectively less certain) then this can be taken into account during the least squares fitting, by weighting each point by the inverse of the variance of that point.
In most cases, where only a single time series exists to be analysed, the variance of the e's is estimated by fitting a trend, thus allowing at + b to be removed and leaving the e's as residuals, and calculating the variance of the e's from the residuals — this is often the only way of estimating the variance of the e's.
One particular special case of great interest, the (global) temperature time series, is known not to be homogeneous in time: apart from anything else, the number of weather observations has (generally) increased with time, and thus the error associated with estimating the global temperature from a limited set of observations has decreased with time. In fitting a trend to this data, this can be taken into account, as described above.
Once we know the "noise" of the series, we can then assess the significance of the trend by making the null hypothesis that the trend, a, is not significantly different from 0. From the above discussion of trends in random data with known variance, we know the distribution of trends to be expected from random (trendless) data. If the calculated trend, a, is larger than the value, V, then the trend is deemed significantly differentiable from zero at significance level S.
Noisy time series, and an example
It is harder to see a trend in a noisy time series. For example, if the true series is 0, 1, 2, 3 all plus some independent normally distributed "noise" e of standard deviation E, and we have a sample series of length 50, then if E = 0.1 the trend will be obvious; if E = 100 the trend will probably be visible; but if E = 10000 the trend will be buried in the noise.
If we consider a concrete example, the global surface temperature record of the past 140 years as presented by the IPCC: [1], then the interannual variation is about 0.2°C and the trend about 0.6°C over 140 years, with 95% confidence limits of 0.2°C (by coincidence, about the same value as the interannual variation). Hence the trend is statistically different from 0.
[edit] Goodness of fit (R-squared) and trend
Illustration of the variation of r2 with filtering whilst fit remains the same
The least-squares fitting process produces a value – r-squared (r2) – which is the square of the residuals of the data after the fit. It says what fraction of the variance of the data is explained by the fitted trend line. It does not relate to the significance of the trend line – see graph. A noisy series can have a very low r2 value but a very high significance of fit. Often, filtering a series increases r2 whilst making little difference to the fitted trend or significance.
Real data need more complicated models
Thus far the data have been assumed to consist of the trend plus noise, with the noise at each data point being independent and identically-distributed random variables and to have a normal distribution. Real data (for example climate data) may not fulfill these criteria. This is important, as it makes an enormous difference to the ease with which the statistics can be analyzed so as to extract maximum information from the data-series. The use of least-squares estimation of the trend is valid, but might be improved. Statistical inferences (tests for the presence of trend, confidence intervals for the trend, etc.) are invalid unless departures from the standard assumptions are properly accounted for.
Dependence: autocorrelated time series might be modeled using autoregressive moving average models.
Non-constant variance: in the simplest cases weighted least squares might be used.
Non-normal distribution for errors: in the simplest cases a generalised linear model might be applicable...
Comment