(Revised August 26, 2006)
by
Robert Murray, Ph.D.
Omicron Research Institute
(Copyright Ó 2006 Omicron Research Institute. All rights reserved.)
According to the Random Walk model, stock price returns (the changes in price over a given time period, such as from one closing price to the next) are supposed to be independent, uncorrelated random variables. (More precisely, it is the logarithmic price returns that are usually considered. These are postulated to be Gaussian random variables.) Then the logarithmic prices, which are the sum of these independent price returns, follow a stochastic process called the Random Walk. The main consequence of the Random Walk hypothesis is that future returns are independent (and hence uncorrelated) with the past prices (or any other financial data, such as fundamental data). So, theoretically no function of past data can be used to predict future price returns. This is a statement of the Efficient Market Hypothesis, of which the Random Walk model is a special case.
If the market were perfectly efficient, then there would be no point to short-term trading. On the average, the expected return from short-term trading would be zero, relative to a buy-and-hold strategy. If the Random Walk process is one with drift, corresponding to the secular upward trend of the stock market, then the buy-and-hold strategy would give an overall average return over a long holding period (equal to the secular trend). This is presumably a reward for the risk inherent in stock investing, which is measured by the variance or standard deviation of the Random Walk over time. But short-term trading would only increase the risk, with no corresponding increase in expected returns over time. Thus it would be just like gambling, except that the expected return (over buy-and-hold) would be zero (rather than a loss, as with most gambling).
However, hardly anybody believes that the market is truly efficient. There are many people interested in short-term trading, and many others who are prudent to buy and sell securities over longer term holding periods, as the situation changes and different securities look more promising (based on past information). A simple, rough argument indicates that the market can never be truly efficient. If the market were perfectly efficient, then there would be no reward for short-term trading (or longer-term trading either), so people would stop trading. But it is precisely the trading activities, on all time scales, that keep the market efficient. Hence when the trading stops, inefficiencies would immediately be created, which would induce people to start trading again because they would then be able to make a profit. So the conclusion is that people trade to the extent that they can still make a profit, so the efficiency of the market is dictated by the ability of the best traders to still be able to make a profit (at the expense of the less knowledgeable traders). So we expect inefficiencies to exist at a level that the most sophisticated traders are just able to find and take advantage of them (in a best case scenario). At the present time, the market is almost efficient, but it can never be perfectly efficient. Profitable trading opportunities will always exist for the most sophisticated traders.
In order to find trading rules that work, we must find certain functions of the past price data (and/or perhaps other financial data such as fundamental data) that have a non-zero correlation with future returns. (See the Appendix for the definition of correlation.) As we have stated, the Random Walk model states that this correlation should be zero. We can construct various functions and measure their correlation with future returns, or more precisely, we can measure the sample correlation. The sample correlation is an estimate of the actual correlation, based on a finite sample of data. The true correlation can only be determined in a hypothetical stochastic system in which there is an infinite amount of data available, and the stochastic process is second-order stationary, meaning that the correlation is constant for the whole data set. And here is a major problem regarding financial data: There is almost never a very large data set to work with, and within this data set it is almost certain that the stochastic process is non-stationary. So the measured correlation within one block of data will (probably) be different from that within other blocks of data in the same data set. Furthermore, within a finite data set, the sample correlation is itself subject to a statistical uncertainty. A totally random data set can yield a measured value of the sample correlation, which is non-zero, just because of random statistical fluctuations. The standard error for these fluctuations, for the usual Linear or Pearson’s R correlation, is given by _{}, where N is the number of data points in the set. (The standard error is slightly smaller for the robust correlation methods.) So, for a set of returns 100 days long, the standard error of the sample correlation for these returns is 10%, which would be a very sizable correlation if it existed. For a data set 1024 days long, which is the usual length of the data set that we work with, the standard error is 3.125%, which would be a small but non-negligible correlation if real. Furthermore, there are indications that long-range correlations only extend to a maximum of 1024 data days, or four years [Peters (1991, 1994)]. So, the conclusion is that any correlations that exist in the data, are likely to be “down in the statistical noise” and of the same order of magnitude as the statistical uncertainty of the sample correlations. Nevertheless, these small correlations, if real, can lead to very sizable returns from short-term trading.
As an example, suppose we find a technical indicator that has a 5% correlation with the 1-day future returns. Suppose the daily volatility is 2% (r.m.s. value of daily returns). Then, setting the daily trading position (trading rules) proportional to the technical indicator, the expected daily gain is the product of the correlation times the volatility, or 0.1%. Assuming 256 trading days per year, this leads to a simple annual gain from short-term trading of 25.6% and a compounded annual gain of 29.2% (over buy-and-hold), which most people would regard as excellent! However, by most standards the 5% correlation, given a standard error of 3.125%, would not even be regarded as statistically significant. The conclusion is that if we want to find trading rules that work, we have to search for correlations that are barely above the statistical “noise” level, and as a result we must also accept that the standard deviation of the gains (from short-term trading) will inevitably be of the same order of magnitude as the gains themselves. Nevertheless, if the short-term trading is done within the setting of an overall portfolio strategy, the standard deviation for short-term trading for the whole portfolio can be reduced while the returns remain the same. In this case the standard deviation of the returns will be reduced roughly by a factor _{}, where now N is the number of securities in the portfolio. Of course, to get this _{} reduction in the standard deviation, it is necessary to do N times as much work!
Regarding the statistical significance of the correlation, the usual interpretation is that a correlation greater than two standard errors (from zero) is regarded as significant. A correlation this large, at least 6.25% in the example above, is achieved only 4.6% of the time by pure chance alone. (This corresponds to a 4.6% significance level.) So, we say that this correlation is significant at the 95.4% confidence level, because there is a 95.4% chance that this correlation is not due to chance alone. (We are calling the confidence level that quantity which is 100% minus the significance level.) Theoretically, when estimating the “true” correlation by means of the sample correlation, the measured sample correlation will itself be a random variable with a Gaussian distribution of values. The standard error of this distribution is _{} as stated above, for a sample size N. Thus, if there is no actual correlation at all, then the measured values of the correlation will be distributed around zero, with a standard error _{}. These values will lie within one standard error of zero 68.3% of the time, within two standard errors of zero 95.4% of the time, and within three standard errors of zero 99.7% of the time [Natenberg (1994)]. So, if the measured correlation is not at least two standard errors away from zero, it is usually regarded as not statistically significant. However, this does not mean that if the measured correlation is within two standard errors of zero, then it is necessarily not a real correlation. All it means is that the measured correlation is consistent with zero correlation (to the 4.6% significance level). Most of the correlation we measure, at the “peaks” in the Correlation Test display in QuanTek, are actually more than two standard errors away from zero, so they can be regarded as significant. However, we prefer the following interpretation, which seems more reasonable: The measured correlation represents the mean or expected value of the actual correlation, and this value is uncertain by an amount given by the standard error, _{}. In this way we are not forced to ignore measured correlations that are within two standard errors of zero, and then “define” them to be zero. We regard the measured correlations to be the most likely value of the actual correlations, subject to a rather wide uncertainty given by _{}. If Edgar Peters (1991, 1994) is correct and the correlations do not persist longer than 1024 days or so, then we cannot reduce this statistical uncertainty any lower than about 3% by taking a larger data set, so there is never any way to conclusively separate the correlations we are seeking from the stochastic uncertainty of the sample correlation measurement. Nevertheless, these correlations, provided they are really there (which they seem to be), can still be used to construct profitable (over the long term) short-term trading rules.
The ultimate point is that there is no “Law of Large Numbers”, or mathematical limit as _{}, that we can take in order to prove conclusively the existence of correlation, or measure the sample correlation to arbitrarily high confidence levels. This limit might be approximated by finding some trading rule, and testing it on a whole portfolio of stocks over a long period of time, say 2048 days. In this way, we may finally be able to find an unambiguous signal for a highly statistically significant correlation, and the portfolio ensemble then plays the role of the very large statistical ensemble. But such a calculation might take hours or days to perform, and I have not yet attempted such long calculations. In the meantime, it is still necessary to apply a certain amount of intuition in deciding which are meaningful correlations and which are just stochastic noise in the measured correlations (when constructing technical indicators using QuanTek). So it is still a creative process, requiring some thinking and judgment, to choose an effective set of trading rules using the statistical tools in the QuanTek program. Short-term trading cannot and should not be reduced to a mere mechanical operation, at least in my view. (Having said this, I should add that the QuanTek program yields some rather clear signals for correlations between certain technical indicators and future returns, which certainly do not look like stochastic noise. But there is no statistical test that can prove conclusively that they are real correlations. Without an infinite data set, or at least a very large one, it is impossible to prove anything conclusively from Statistics.)
The usual definition of a technical indicator is some function of the past price data, which “signals” a buy or sell point. As a prototype, one of the most commonly used technical indicators is a combination of two (exponential, say) moving averages, one with a longer time scale than the other. When the shorter MA crosses the longer MA moving upward, this is a buy signal, and when the shorter MA crosses the longer MA moving downward, this is a sell signal. The expectation is that as long as the shorter MA is above the longer MA, the prices will be in an up-trend, and as long as the shorter MA is below the longer MA, the prices will be in a down-trend [Pring (1991)]. (Evidently there is an assumption here that the prices will be in one of two modes, either bull or bear market, and that these modes will last much longer than the time scale of the moving averages themselves.) Equivalently, we can form an oscillator from the two MA’s, by subtracting the longer one from the shorter one (assuming logarithmic price data). This is a logarithmic version of an oscillator called the Moving Average Convergence-Divergence (MACD), in which the ratio of two exponential MA’s of the price data is taken [Pring (1991)]. Then the buy/sell points are marked by the points at which this MACD crosses the zero line, moving up or down respectively.
I would like to remark here that, in my opinion, most of the traditional rules of Technical Analysis are probably obsolete. They probably worked well in decades past, when there were far fewer players in the market and the rate of information exchange was much slower and the amount of information available much less. The markets are undoubtedly much more efficient now than they were when these traditional rules were first formulated [Edwards & Magee (1992)]. In particular, the ability to signal a long-term trend change by the crossing of two MA’s of much shorter time scale seems “too good to be true”, as do the other methods of signaling a trend change by means of technical patterns of short time duration. Probably in today’s market the predictive power of any technical indicator formed from price data over a certain time scale is only of the order of that time scale.
I would now like to make a slight generalization of the concept of technical indicator, and regard a technical indicator as any function of the past prices (and possibly other data), which is supposed to be correlated with future returns. So, for example, the implication is that the oscillator formed from two moving averages will be above zero when the (intermediate or long-term) future returns are positive, and below zero when they are negative. In other words, there is expected to be a positive correlation between this oscillator and the future returns over some time interval N. It is possible to form a whole variety of technical indicators of this sort, and measure their correlation with N-day future returns to determine their effectiveness. Then, either a linear trading rule can be used in which the position in the security is adjusted to linearly follow the value of the indicator, or a non-linear trading rule can be used in which the position is long by a fixed amount when the indicator is positive and short by a fixed amount when the indicator is negative. (This latter trading rule, of course, requires far fewer trades.) Likewise, the indicator itself can be a linear function of the past returns, such as MA’s or sums and differences of MA’s, or it can be a non-linear function of past data, such as polynomials or the hyperbolic tangent function or the error function. By using non-linear functions of the data and measuring their linear correlation with future returns, we are actually capturing some of the higher-order statistics of the data, which is probably important for financial data. However, for the time being we will confine the discussion to various linear combinations of various types of smoothings of the past data. However, our method can be extended to non-linear functions of the past data simply by defining and using such functions instead of linear ones. Evidently, some of the traditional technical indicators themselves may be regarded as very complicated non-linear functions of the past price data. Examples of this would be support/resistance levels, head and shoulders tops and bottoms, triangles, rectangles, flags, and so forth, and even trend lines for bull and bear trends. However, once again I question the validity of some of these patterns in today’s market.
There appear to be two basic categories of technical indicators, corresponding to two basic categories of correlation. The most basic correlation is what is known as return to the mean. This implies that there is some mean or “correct” price, which the security returns to if the security becomes mis-priced. So, if the price is below some average level, it can be expected to move higher, and if it is above the average level, it can be expected to move lower. So the technical indicator consists of the current price relative to some longer-term average or smoothed price. The future returns are then expected to be anti-correlated with this indicator some number of days in the past (or correlated with the negative of the indicator). Since the security becomes mis-priced in the first place after some up- or down- move, the presence of a return to the mean mechanism also shows up in the anti-correlation of past returns with future returns. There is a rather pronounced anti-correlation in daily returns (for some securities) up to about three days in the past, with the future one-day returns, and this can be explained by the return to the mean mechanism acting over these very short time intervals. It also appears to act over much longer time intervals as well. It should be noted that this mechanism is nothing other than the famous “Buy Low – Sell High” strategy.
A second correlation is known as trend persistence. This correlation corresponds to the tendency of the market to remain in either a bull or bear market. In other words, if returns are positive or negative in the past, they are the same in the future, so that there is a positive correlation between past and future returns. This mechanism would seem to be at variance with the return to the mean mechanism, which implies negative correlation. However, these two mechanisms can be reconciled by supposing that the “mean” is some smooth, slowly varying function of past prices and economic data. The trend, corresponding to a bull or bear market, is persistent and is related to the (usually) slowly varying rate of change of this price mean. (Or it can be thought of as the mean value of the returns.) Then, the shorter-term fluctuations about this mean price level are anti-persistent, and correspond to the return-to-the-mean mechanism. So, given any time scale, we may smooth the price data on this time scale, and then suppose that the smoothed long-term trend is persistent, and the short-term fluctuations about this trend are anti-persistent. Evidently in an efficient market, these two mechanisms “cancel out” on all time scales, leading to zero correlation and neither persistence nor anti-persistence of returns. But when inefficiencies exist, they do not cancel out, and correlation may exist on certain time scales. Evidently the true situation is much more complicated than this, and what has just been said should be regarded as merely an oversimplified “sketch” of the true picture. To our knowledge, nobody has yet formulated a complete theory of stock price correlations, although steps in this direction are outlined in The Econometrics of Financial Markets [CLM (1997)].
A possible third correlation does not really have a name, but we will call it the presence of turning points or trend reversal mechanism. According to this idea, if we can identify the turning points or changes of trend of the price data, then this will be correlated with a future positive or negative trend. In other words, if we can identify a point where the trend seems to change from negative to positive, then this should be correlated with a future positive trend, and a point where the trend seems to change from positive to negative should be correlated with a future negative trend. Examples of these change-of-trend indicators in traditional Technical Analysis are identification of top and bottom formations such as head-and-shoulders. However, we may also construct an oscillator-type indicator by taking the rate-of-change of the returns, which is itself a rate-of-change of the log prices. In other words, the returns are the first derivative (velocity) of the log prices, while the rate-of-change of returns is the second derivative (acceleration) of the log prices. The hypothesis is then that this turning point indicator is correlated with future returns, at some point in the future. However, this indicator may be less reliable than the first two, because it tends to emphasize the higher frequency modes, while most of the correlation seems to exist in the low frequency modes.
The Random Walk model may be thought of as a model in which each price movement is an independent random shock. Such a random shock is presumably the result of some business development or news input regarding the security, at least for the larger shocks. However, probably a more realistic model of stock price behavior is that it is due to random shocks occurring at infrequent intervals, and in between the shocks the price action is due to investor reaction to these shocks. This investor reaction is not instantaneous, in the real world, so the market is not perfectly efficient. The investors react to the shocks and the present state of the market with some finite time delay, which is of the order of the investment horizon of that investor. Also, many investors do not know how to properly interpret the present condition of the market, so they over-react and cause prices to swing above or below their “fair value”. This combination of inefficiencies should cause some sort of dynamical behavior of asset prices in response to the shocks due to external influences, such as the state of the company itself or of the overall economy, or political events. So, we have a set of shocks, with large shocks occurring at infrequent intervals, and smaller shocks occurring more frequently, according to some power spectrum, say, and a dynamical reaction to the shocks, which is delayed in time according to the spectrum of time horizons of all the investors. So the result is a spectrum of unpredictable random shocks, and of predictable dynamical responses to those shocks. It is these dynamical responses that Technical Analysis hopes to capitalize on by means of various indicators. But the point is that, due to the finite response time of investors, the deterministic part of the price patterns are, to some extent, smooth and slowly varying. At least, that is our hypothesis, assuming the use of end-of-day data. There is additional correlation in intra-day tick data for time scales shorter than, say, 20 minutes, but we are only making use of end-of-day data here. (For a partial theory of correlation in price tick data, see The Econometrics of Financial Markets [CLM (1997)].)
Hence we may postulate a model for stock price action. It consists of a deterministic part, which can be predicted (in principle, if not in practice), which is smooth and slowly varying, and hence consists of the lower-frequency Fourier components of the returns process. To this is added a random part, which may be modeled as stochastic white noise, with a constant spectrum. Thus most of the high-frequency variation of prices is random, stochastic noise with very little predictive power. (However, an exception to this is the apparent anti-correlation of returns over time intervals of a few days.) In order to uncover the predictable, deterministic part, it is necessary to employ smoothing to filter out the high-frequency components. Otherwise, the small correlations in the low-frequency deterministic part are completely drowned out in the high-frequency noise and cannot be seen. This is probably why it has been found so many times that the stock price data are statistically a Random Walk, and no clear deviations from the Random Walk can be seen by the classical statistical tests. After smoothing the data, however, we do find some clear indications of usable correlations, although it should be emphasized that these are hardly ever very far above the level of the stochastic noise.
The main type of smoothing used in the QuanTek program is called the Savitzky-Golay smoothing filter. This is a state-of-the-art digital smoothing filter, which has the property that it preserves the first and second moments of the price data. (In other words, if there are peaks in the data, the smoothing preserves the positions of the peaks and also their widths.) This filter uses Fourier methods to compute the smoothing, and it turns out that it can also be used to make an extrapolation into the future based on the past data. Essentially, the filter decomposes the data set into its Fourier components, filters out the high frequency components, and then the extrapolation consists of extending the low-frequency components that are left forward in time, preserving their phase relationships. It appears that this extrapolation itself has predictive power, indicating that these low-frequency components persist in time, at least out to one wavelength or so for each component. The Savitzky-Golay smoothing filter itself comes in two variants, the acausal and causal filters. The acausal filter smooths over a time window consisting of a number of days in the past and future around the given day, equal to the smoothing time period. This acausal filter has the advantage that it preserves the phase relationships of the various Fourier components (zero-phase filter). The causal smoothing filter smooths over a time window equal to two smoothing periods in the past. Hence there is an inherent time delay of (approximately) one smoothing period with the causal filter. This causal filter will not preserve phase relationships, which is a disadvantage. A third type of smoothing is the ordinary exponential Moving Average (MA). This type of smoothing filter is also causal, in that it does not make use of any data in the future relative to the given day. As is well known, the exponential MA also introduces a time delay of the order of one smoothing period (for a time scale of smoothing of two time periods). The exponential MA could itself be used to make a future extrapolation, because technically it is a digital filter just like the Savitzky-Golay smoothing filter. In fact, the exponential MA is actually the optimal Linear Prediction filter for the MA(1) process (Moving Average process with an autocorrelation sequence of 1 time unit in length) [Harvey (1989), p.22].
The Savitzky-Golay smoothing filter also has the capability of computing the smoothed first and second (and higher) derivatives of the price data. Thus using this filter we may directly compute the smoothed velocity and acceleration indicators mentioned above. Using the acausal filter, these smoothed velocities and accelerations should exhibit no time delay, and be in phase with the price data. Using the causal filter, on the other hand, will introduce a time delay and the various Fourier components will be out of phase to some extent, which depends on the frequency and phase response of the filter. This is why I prefer the acausal filtering to identify the buy/sell points. The time delay of the causal filter depends on the order of the filter which is chosen to be four, with the zero order causal SG filter being equivalent to an ordinary (not exponential) MA. The higher order causal SG filters still are causal and use only the past data, but they do not exhibit as much time delay, because it is compensated for in the way the data are smoothed. In other words, within a block of data 2N+1 units long, the usual MA and zero order SG filter fit the data to a constant average value within this block, but higher order SG filters fit the data to a straight line, parabola, and so forth. (The fourth order SG filter fits the data to a fourth order polynomial.) This has the effect of essentially eliminating the time delay, although this time delay is eliminated through a polynomial fit to the data in each block. However, the phases of the various Fourier components are still not preserved using the causal filter, unlike the case of acausal filtering.
The Price Projection in QuanTek is computed using a standard Linear Prediction filter. This LP filter assumes stationarity of the stochastic process, even though in reality the financial time series is no doubt a non-stationary stochastic process. However, evidently stationarity is a reasonable assumption over a period of, say 1024 days (about 4 years). The design of a filter for non-stationary time series is a much more complex and difficult problem, however it is one worth working on.
Given a stationary stochastic process, the Linear Prediction filter is derived making use of the second-order correlation in the (logarithmic returns) data. This correlation is assumed constant in time, due to stationarity. On a theoretical level, the filter parameters are given by the Yule-Walker equations. To derive these equations, we need to pretend as if there is an infinite statistical ensemble of realizations of a given stochastic process, or in other words, the stochastic process specifies the statistical properties of the time series, and we can generate an unlimited number of actual time series with these statistical properties, starting with a different set of random numbers as input for each time series. Then we can define, theoretically, an expectation value over this statistical ensemble of realizations of the stochastic process, as an average of some given quantity over the whole ensemble. In practice, these expectation values may be approximated by various sums over the available data set, such as for example the sample mean and sample covariance.
Now, using the notation of Haykin (2002), suppose we are given a financial time series of length N. The N-by-1 observation vector is defined by [Haykin (2002)]:
_{}
The range of the indexes is left unspecified, but in our work we normally take the index to increase moving forward in time, with the index of the present time (last day of the past data set) denoted as 0, and the past indexes as negative integers and the future indexes as positive integers.
The covariance matrix is then defined by the following expectation value with respect to a statistical ensemble of time series:
_{}
Correspondingly the autocovariance function or autocovariance sequence is defined as the expectation:
_{}
In terms of the autocovariance sequence r(k), the covariance matrix is given by:
_{}
Note that due to stationarity,
the coefficients r(k) of the autocovariance sequence or
matrix depend only on the time lag between the two elements u(n)
and u(n-k) of the time series, not on the index n of the
series. This is fortunate, for
otherwise we would not be able to approximate these quantities by the
corresponding sample expectation values.
The sample autocovariance sequence is then given by the following
sum over the time series values, separated by the time lag k:
_{}
Of course, for a data set of length N, this sum runs out of the bounds of the data set. So the above definition should be interpreted as applying to an infinitely long time series, from which we extract a sum over N terms. For a data set of finite length, this definition can be modified appropriately.
Now we suppose that the future value of the time series can be (partially) predicted as a linear sum over past values of the time series, plus a random white noise term. (The Random Walk model corresponds to the white noise term alone.) A model in which the future value of a variable in a time series is determined as a linear function of the past values, plus additive white noise, is called an autoregressive process (AR) of order M. This process satisfies the following difference equation [Haykin (2002)]:
_{}
The additive white noise _{} is assumed to be a Gaussian random variable of zero mean and constant variance, uncorrelated for different times:
_{}
Here, _{} is the noise variance.
Next we take the expectation value of the autoregressive process with _{} on both sides of the equation. Making use of the fact that the white noise term _{} is uncorrelated with anything with an index different from n, we arrive at the following result:
_{}
Using the definition of the autocovariance sequence, we may then rewrite this in terms of the autocovariance sequence itself:
_{}
This may then be written in explicit matrix form as follows, noting that due to stationarity we have _{}:
_{}
This set of equations is called the Yule-Walker equations. They may be expressed for simplicity in explicit matrix form. Then, assuming the covariance matrix is nonsingular, it may be inverted and we may solve for the Linear Prediction (LP) coefficients as follows:
_{}
Hence the coefficients of the
stationary autoregressive process may be obtained from the covariance matrix,
provided that it is not singular, by a simple matrix inversion. These are the basic equations for the Linear
Prediction filter. The only
remaining issue is how the covariance matrix is to be estimated. It turns out that there are a variety of
ways of doing this.
It should be noted from the definition of the covariance matrix for a stationary process that the matrix is symmetric and all the elements on each (major) diagonal are equal. This type of matrix is called a Toeplitz matrix. The matrix inversion process described above for the computation of the LP coefficients can be problematic, in general, for a large matrix. But for a Toeplitz matrix there exist fast routines that can be used for their inversion numerically. Thus, once the autocovariance sequence is estimated, and from it the covariance matrix is formed, the covariance matrix can be inverted using this routine to obtain the LP coefficients from the product of the inverted covariance matrix and the autocovariance sequence. The LP filters that use this Toeplitz matrix inversion are denoted the Toeplitz LP filters in QuanTek.
In order to estimate the covariance matrix, there are three methods utilized in the current version of QuanTek. Probably the simplest method is to just compute the sample autocovariance using the formula given above. More generally, the sample mean and covariance are given by the following formulas, for a data sequence _{}, for a sample size of N. These yield a non-negative definite covariance matrix [Brockwell & Davis (1991)]:
_{}
Note that the element _{} of the covariance sequence given above is called the variance. This is a measure of the total noise power in the time series. In practice, even though the autocovariance sequence _{} defined above is defined for all values of k such that _{}, in practice only the first half of the values are used, because it can be seen that the second half of the values use an increasingly smaller number of the time series elements in their definition. In fact, the last element of the covariance sequence is just the product of the first and last elements of the time series (minus the mean), according to the above formula. Hence, a covariance sequence half the length of the data set, or 512 days in length, is used in QuanTek, and correspondingly there are 512 LP coefficients used, for a data length of 1024 days (4 years). However, the above method can be used with a data set of any length, and in particular works well for short data sets.
An alternative method for estimating the covariance matrix is by means of the spectrum. The spectrum can be estimated either using the Fast Fourier Transform (FFT) or the Discrete Wavelet Transform (DWT). The FFT is a transformation in which the data set of daily returns, as a function of time, is decomposed into its component frequencies in an interval of fixed length, which we take to be 1024 days for the filter routines. The output of the FFT routine, for a date length of 1024 days of returns, yields an amplitude for each frequency interval between 0 frequency and a frequency of 1 cycle per 2 days (Nyquist frequency), which is the maximum frequency possible for daily time series. There are 512 evenly spaced frequency intervals in this range for a data set of 1024 days. The other half of the output consists of 512 values of the phase, which are not used. The spectrum is then obtained by squaring the amplitude, which yields the spectral power at each frequency. The FFT thus represents the time series as a sum of sine waves of constant amplitude over the 1024-day range. The DWT, on the other hand, works a little differently. In this case, the data set of 1024 days of returns data is decomposed in the wavelet basis, which is a set of waves, which are localized in both frequency and time, but of finite extent in both. In this wavelet basis, the returns data are represented as amplitudes as a function of time, decomposed into a set of 8 frequency octaves. These amplitudes are then squared to get the spectral power corresponding to each wavelet component. For the filter routine these components of the power spectrum are then averaged over time, yielding just the 8 power spectral values, one for each frequency octave. So the DWT power spectrum only contains 8 values, versus the 512 values for the FFT power spectrum. In averaging over the time in each octave, the goal is to eliminate the stochastic noise while preserving whatever “signal” is present. Presumably most of this signal occurs in the lowest frequency parts of the spectrum, so the octave decomposition of the DWT is perfect for capturing these low-frequency components. The DWT representation would also be perfect for a filter designed to capture non-stationary correlation, which is still under development. The DWT power spectrum is displayed in the Hybrid Filter dialog, for reference when setting the filter parameters. There is also a separate dialog box display for both the FFT spectrum and the DWT spectrum.
The point of computing the spectrum is to detect if there is any correlation present. If there is no correlation, and the time series is just random white noise, then the true power spectrum will be flat or constant. However, the measured power spectrum, from the FFT or DWT, is not generally constant due to stochastic noise. In fact, it is shown in standard textbooks on time series [Brockwell & Davis (1991)] that for the case of the FFT, the variance of each value of the power spectrum, for each discrete frequency, is 100%, or in other words, totally uncertain. In order to uncover the true spectrum, therefore, the FFT spectrum must be smoothed, or averaged over a number of frequency values. But by observing the smoothed spectrum, especially at the low frequency end, it may be possible to discern a deviation from a constant white noise spectrum, which indicates the presence of correlation in the returns time series. This connection between a variation of the spectrum and the presence of correlation in the time series is made explicit by the Wiener-Khinchin Theorem, a nice (short) description of which given in Numerical Recipes [NR (1992)]. This theorem simply states that the power spectrum is the Fourier Transform of the autocovariance function, and vice-versa. Then since the covariance matrix is built from the autocovariance function as described earlier, by taking the Fourier Transform of the power spectrum, we arrive at the covariance function and hence the covariance matrix. Then this covariance matrix is used to derive the LP coefficients as described above. QuanTek uses two different filters that utilize this technique, one employing the DWT spectrum and the other employing the FFT spectrum. Both of these filters require at least 1024 days of data, because that is the data length used by both the FFT and DWT transforms.
The Price Projection is generated for each security by a Linear Prediction filter, which can be specified separately for each security. That is because the statistical properties of each security, over the past 1024 or 2048 days, may be different, and a different filter, or the same filter type with different parameters, might be appropriate for each security. The LP filter type and parameter settings, for each security, are set and saved for each security data file using the Hybrid Filter dialog box. The filter types available at present are the three Toeplitz filters described above. The default filter, which is the first in the list, is the Toeplitz DWT filter, and next in the list is the Toeplitz FFT filter. Both of these filters require at least 1024 days of returns data for their proper operation, otherwise they may not give accurate results. The third filter on the list is the Toeplitz ACV (“AutoCoVariance”) filter, which is the default filter if the data set is shorter than 1024 days. This filter uses the sample autocovariance described above, not spectral methods. The next three filters are modifications of the generic Linear Prediction filter, which uses the Levinson-Durbin algorithm. These also work with any length data set. The last filter in the list is the generic LP filter from Numerical Recipes [NR (1992)].
In the Hybrid Filter dialog box, in addition to the selection of filter type, there is also a setting for the approximation order, and one for the fractal dimension. The approximation order setting is for the purpose of trying to “smooth” the spectral response of the filter to eliminate stochastic noise. This uses a method of Chebyshev polynomials to approximate the spectral response of the filter by a sum of N (modified) Chebyshev polynomials, where you can set N anywhere from 0 to 512. It turns out, however, that this number N of the polynomial approximation order is nearly identical in its effect with simply truncating the number of LP coefficients to N coefficients. So it is the same as approximating the stochastic process as an autoregressive process of order N, AR(N). This should cut down on the stochastic noise if most of the real correlation exists for short time lags (“short-memory process”), corresponding to small values of N, and for longer time lags the autocorrelation is just random stochastic noise.
However, the Hybrid LP Filter also has the capability to model a long-memory process. This is done by means of the fractal dimension, or fractional difference parameter setting. This fractal dimension is a parameter that ranges from –0.50 to +0.50, although in the dialog box it is represented as the range –50 to +50, in integer steps. This parameter has the effect of setting the low-frequency response of the filter. If the FD parameter is positive, then the low-frequency response of the Hybrid LP filter goes to infinity, theoretically, as the frequency goes to zero. This corresponds to the case of long-range trend persistence. If the FD parameter is negative, the low-frequency response goes to zero as the frequency goes to zero. This then should correspond to the case of long-range trend anti-persistence or a return to the mean mechanism. It appears that most of the effectiveness of the Hybrid LP filter depends on a correct estimate or selection of the low-frequency spectral response of the filter, so this setting of the fractal dimension is important. It should be estimated and set independently for each security, in order to achieve good correlation of the Price Projection of the filter with the future returns.
In the Hybrid Filter dialog, there are two spectrum displays. The top display is the ordinary DWT spectrum, using the latest 2048 days of data. As in the case of the other spectrum displays, only the lower half of the spectrum is displayed, so the highest frequency octave is not displayed. (See the sections of this article on the Periodogram spectrum and Wavelet spectrum.) This display is used for reference for setting the filter characteristics. The bottom display is the Filter Spectrum using the Maximum Entropy Method – in other words, the spectral response of the filter itself, as derived from its set of LP coefficients. By viewing this bottom display, you can clearly see the effect on the filter spectrum of changing the Filter Approximation Order and Fractal Dimension settings of the filter. The relationship between these two needs to be explained. In calculating the Chebyshev polynomial approximation, first the factor due to a non-zero fractal dimension is divided out from the filter spectral response. Then the Chebyshev approximation is applied to what is left, which should be the spectrum of an ordinary autoregressive or “short-memory” process. Then the factor corresponding to the non-zero fractal dimension or “long-memory” process is multiplied back in. The result is that for low orders of the filter approximation, the fractal dimension factor must be included by hand, otherwise it will be eliminated by the low-order Chebyshev approximation. On the other hand, for high orders of the filter approximation, factoring out the fractal dimension part, doing the Chebyshev approximation, and then factoring back in the fractal dimension part, has practically no effect. So the Fractal Dimension setting has very little effect for high orders of the Filter Approximation Order. But for low orders of the filter approximation, it is possible to see in the bottom pane the theoretical spectral response due to a low order AR filter with various settings of the fractal dimension. In particular, when the Filter Approximation Order is set to zero, what you get is the spectrum of a pure fractionally differenced process, corresponding to the given setting of the Fractal Dimension.
In many cases, it is possible to see quite clearly from the DWT spectrum the low-frequency behavior corresponding to a fractionally differenced process. So if this low-frequency behavior is modeled by the correct settings of the Fractal Dimension and Filter Approximation Order, then the filter should have good predictive power. Generally there will be some particular settings of these parameters that give optimum performance for each security over the past 2048 day interval. These settings can be tested from within the Hybrid Filter dialog by calling up the Correlation Test – Filters dialog to test the correlation between the filter output and the future returns. In this way it can be verified that the Price Projection is really working correctly, giving a projection of the future returns that has positive correlation with the actual future returns (over the past 2048 days, and presumably persisting 100 days into the future).
QuanTek has a variety of statistical tests and displays, designed to measure quantities of interest in Econometrics and Time Series Analysis. These tests are for the most part completely standard, and explained in standard textbooks on Time Series Analysis and Signal Processing. All of these tests are designed to detect the presence of correlation in the financial returns data. This is not an easy task, because invariably this correlation will be buried in the stochastic noise and will be hard to spot. However, if it can be correctly separated from the stochastic noise and utilized, this correlation can potentially lead to significant gains in short-term trading. This, of course, depends on the correlation being both real and persistent. The reality is verified by the significant correlation revealed by the two Correlation Tests (Filters and Indicators), even assuming stationary statistics over the 2048-day time period. As far as the persistence goes, since the correlation and spectrum tests extend over data sets of 2048 days, it is not implausible to suppose that any correlation they do detect will be persistent over the next 100 or so days in the future. However, we know of no way at present to prove the persistence of the correlation, if any.
The Periodogram is a standard display of the spectrum of a time series, as explained in most textbooks on Time Series Analysis [Brockwell & Davis (1991)]. The Periodogram Spectrum depends on the Fast Fourier Transform (FFT). The Periodogram Spectrum display in QuanTek makes use of 2048 days (8 years) of returns data, and may not be completely accurate if the data set is shorter than this. The Fourier Transform works by expressing the time series, of length 2048, as a sum of sine waves with frequencies distributed evenly over the interval from 0 to _{} (1 cycle every 2 days, Nyquist frequency), in frequency intervals of _{}. So there are 1024 sine waves, each with an amplitude and a phase, to make a total of 2048 quantities, the same as the original time series. The phase information is not utilized. The amplitude of each sine wave of a given frequency is squared to give the spectral power at that frequency, and the result is displayed as a graph of power versus frequency, to make the Periodogram. It should be noted that only the bottom half of the spectrum is displayed, corresponding to 512 values of the power spectrum out of the 1024 total. This is because we feel that the upper half of the spectrum is not very significant, since it corresponds to cycles with periods between 2 and 4 days, and with daily data these are probably just stochastic noise. In fact, filtering out just this narrow range of frequencies eliminates half of the noise power spectrum, so we feel this is a worthwhile noise reduction strategy.
The Periodogram display as computed directly has an extremely jagged appearance. It is shown in standard textbooks on Time Series Analysis that the value of the Periodogram at each frequency has a standard deviation of unity, meaning that it is totally uncertain. In order to deduce anything from the Periodogram, it must be smoothed. If the smoothing is over N frequency intervals, it can be shown that the standard deviation is approximately _{}, so the more the Periodogram is smoothed, the less the statistical uncertainty. The QuanTek Periodogram display has smoothing that is selectable over a range of values. When smoothed, any deviation from a constant spectrum might be visible. Due to the Wiener-Khinchin Theorem, the Periodogram is the Fourier Transform of the autocovariance function. This means that if the original time series is completely random “white noise”, then the smoothed Periodogram should be constant. Any deviation from a constant spectrum is an indicator of correlation in the time series. This is the main purpose of studying the spectrum of the time series – to detect possible correlation. Incidentally, there is a statistical test called the Kolmogorov Test, included in the Periodogram Spectrum dialog, which tests the spectrum for the probability of a deviation from a white noise spectrum. This probability is expressed as what we call a confidence level that the spectrum is non-random, which is 100% minus the significance level for non-randomness.
An alternative method of computing the spectrum is shown in the bottom pane of the Periodogram Spectrum dialog box. This is commonly called the Maximum Entropy Spectrum [NR (1992)]. This method is closely related to the Periodogram. The Linear Prediction coefficients are computed by means of the Levinson-Durbin algorithm, and then from these the Maximum Entropy Spectrum is computed directly. Note that this is the reverse process of the method of computing one of the LP filters used in QuanTek, namely to compute the LP coefficients from the spectrum, as explained elsewhere in this article. So the Periodogram Spectrum is important because it gives a direct indication of the presence of correlation in the time series, and is used directly to compute the Linear Prediction filter to utilize this correlation for the Price Projection.
The Wavelet Spectrum is an alternative method for computing the spectrum. Instead of decomposing the time series (of length 2048) as a sum of sine waves, the time series is decomposed as a sum of wavelets, which are waves localized in both frequency and time. So the decomposition involves an amplitude belonging to a given frequency octave and time interval. This transformation is called the Discrete Wavelet Transform (DWT). The amplitude corresponding to each frequency octave and time interval is then squared to give the power spectrum as a function of frequency and time. However, unlike the Periodogram, in the Wavelet Spectrum display the power spectrum is averaged over time in each frequency octave, so there are only 9 values (for 2048 data values) of the power spectrum, one per octave. This seems appropriate for financial data for two reasons. First of all, the octave decomposition is appropriate because it is the low-frequency components of the spectrum that are of greatest importance. Secondly, the high frequency components are time-averaged, thereby hopefully eliminating much of the stochastic noise. As in the case of the Periodogram, only the bottom half of the spectrum is displayed, so that the top octave is left out. Again, this corresponds to cycles with periods between 2 and 4 days, and these are probably mostly stochastic noise. This DWT spectrum is used in one of the LP filters to compute the covariance matrix, and from this the LP coefficients, as explained elsewhere in this article (using only 1024 data values, however).
The bottom pane of the Wavelet Spectrum dialog contains a time average of the spectrum over the past N days, instead of all 2048 days, in which the number N is selectable. This will be of interest for a wavelet-based filter that utilizes non-stationary correlation in the data. This bottom spectrum is based on the MODWT spectrum rather than the DWT spectrum, where MODWT is a variation of DWT, which stands for Maximal Overlap Discrete Wavelet Transform. This just means that there is a separate spectral component for each of the 2048 days of the time series, for each of 9 octaves, although these are not all independent since there are only 2048 independent quantities. Each octave is adjusted for the proper time relationship to the other octaves, then an average is taken over the latest N days instead of all 2048 days as in the top pane of the dialog.
Also included in the Wavelet Spectrum dialog is a Chi-Square Test for deviations from randomness of the spectral power in the 9 frequency octaves. This is expressed as a confidence level that the spectrum is non-random, which is 100% minus the significance level for non-randomness. This Chi-Square Test for the 9 “bins” or octaves of data generally indicates that the spectral power is close to random. However, there is another Chi-Square test for all 2048 values of the DWT spectrum, before time averaging. This test gives an amazing number of standard deviations away from randomness. It is not exactly clear how to interpret this, except that it means that the returns series is not Gaussian white noise. (There is a Gaussian white noise time series generated in the dialog for purposes of comparison.) Either it means that the returns have “fat tails” and deviate from a Gaussian distribution, or else it indicates correlation in the time series over short time intervals. Perhaps in doing the time averaging, we are averaging out a lot of short time scale correlation after all! This will have to wait for the future development of a wavelet filter capable of dealing with non-stationary correlation.
The correlation displays in QuanTek, such as the Correlation – Indicators test and the Correlation – Filters test, are of an unusual type. They give the correlation as a function of the time lag of the indicator or filter output. It is important to explain what this means. Let us consider first the output of the Linear Prediction filter, and explain how the data in QuanTek is arranged. The past logarithmic price data are labeled by an index which is 0 for the present day, or most recent day of daily data. Then the index becomes more negative going backward in time, and more positive going forward in time. The index labeling the future Price Projection, on the other hand, is positive. The indexes for the past and future price data may be illustrated in the following diagram:
… |
-6 |
-5 |
-4 |
-3 |
-2 |
-1 |
0 |
+1 |
+2 |
+3 |
… |
The past data indexes are shown in black, while the future data indexes are shown in blue. This is analogous to the price graph in QuanTek, where (with the white background) the past data are shown in black and the future projection is in blue. The returns are the changes in log price from one (close) log price to the next day’s (close) log price. The returns are also labeled by the above indexes, with the index corresponding to the later day of the price difference. (The return with index 0 is the log price for day 0 minus the log price for day –1.)
Now, the Price Projection is supposed to be some (linear) function of the past returns data, which is supposed to be a prediction of the future returns data (and hence prices). Any function at all of the past price data which is correlated with the future returns can constitute a valid Price Projection, so long as it makes use only of past data – no future data allowed! Then, for an N-day time horizon, we wish to compute the correlation between the N-day average of the Price Projection with the N-day average future returns, averaged over days +1 through +N. In order to do this, we need to use the formula for the sample covariance given above, and divide the covariance by the variance to get the correlation. (The correlation always ranges between –1 and +1.) So we need to go back each day, so that each day in the past is “day 0”, the present day “relative” to this past day, and compute a separate Price Projection using only data to the past of (and including) this previous “day 0”. If (n) denotes the n’th day in the past, (with n a positive number or zero) then “day 0” relative to this n’th day in the past will be denoted by 0(n). Similarly, the day with index k relative to this n’th day in the past will be denoted by k(n). So the indexes in the above table should be denoted by k(0). We may similarly extend the table going back, say, 1024 days in the past, as follows:
… |
-4(0) |
-3(0) |
-2(0) |
-1(0) |
0(0) |
+1(0) |
+2(0) |
+3(0) |
… |
… |
-4(1) |
-3(1) |
-2(1) |
-1(1) |
0(1) |
+1(1) |
+2(1) |
+3(1) |
… |
… |
-4(2) |
-3(2) |
-2(2) |
-1(2) |
0(2) |
+1(2) |
+2(2) |
+3(2) |
… |
… |
-4(3) |
-3(3) |
-2(3) |
-1(3) |
0(3) |
+1(3) |
+2(3) |
+3(3) |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
So, since QuanTek typically uses 2048 days of data (8 years), and the Price Projection uses 1024 days of data, we can go back 1024 days and compute a Price Projection for each of the 1024 days in the past, each using the previous 1024 days of data. So the table above will have 1024 rows. It will have 1024 columns of past data (black), and since the Price Projection is computed 100 days into the future, it will have 100 columns of future Price Projection (blue). As a reminder, in the above table, day 0(0) is actually day 0, day 0(1) is actually day –1, day 0(2) is actually day –2, day 0(3) is actually day –3, and so forth, going back 1024 days in the past. Each Price Projection relative to n days in the past is computed using only past data relative to n days in the past. In the Correlation Test, when you compute the Price Projection 1024 times, what you are doing is filling in the blue areas of this table using data relative to n days in the past, for n ranging from 0 to 1023.
Now the sample correlation can be computed between the Price Projection and the future returns. To start, suppose we want to compute the correlation between the 1-day Price Projection (of the returns) and the actual 1-day future returns. This sample correlation will be the sum over the past 1024 days, of the product of the projected 1-day return with the actual 1-day return. We don’t know the future 1-day return relative to day 0(0), because it hasn’t happened yet. But relative to 1 day in the past, the 1-day future return relative to day 0(1) is the return 0(0). Similarly, the 1-day future return relative to day 0(2) is the return 0(1), and that relative to day 0(3) is the return 0(2). The projected return relative to day 0(1) is the projection +1(1), that relative to day 0(2) is the projection +1(2), and that relative to day 0(3) is the projection +1(3). So the sample correlation is the sum of the products of these 1-day projected returns with the corresponding actual returns, going back 1024 days in the past. Similarly, we can also compute the correlation with an N-day time horizon, by taking the average of the first N days of the Price Projection, and computing the correlation with the corresponding N-day average of the future returns. (This is the ordinary, Pearson’s R definition of correlation. The two robust methods of calculating correlation can be calculated using the same data arrangement.)
However, this method is even more general than described so far. We may not be sure whether the phase of the Price Projection is correct. Also, we may want to compute the correlation between past returns and future returns. So instead of just computing the correlation between the 1-day Price Projection and 1-day future returns, we can compute the correlation between any day’s past data or future projection, and the 1-day future returns. In other words, instead of just taking a sum of products of the projection with index +1(n) with the returns with index 0(n –1), we take a sum of products of the data for any k(n) with the returns with index 0(n –1), summed over all n from 1 to 1023. The difference between the index k and the value +1 we denote as the time lag or lead time. A separate correlation is computed for a range of lead times, and this correlation as a function of lead time is displayed as a graph. The correlation corresponding to zero lead time appears under the vertical line marked ZERO on the graph. The graph can then be shifted back and forth, so that different lead times appear under the vertical line, using the Lead Time control. In this way, the entire 100 day’s worth of filter output of the Price Projection can be tested for correlation with the 1-day future return, as can all the past returns (back to –100 days or so). By taking N-day averages as described previously, the correlation can also be measured for an N-day time horizon, for any desired lead time. Each choice of lead time really amounts to a separate technical indicator, so in this way a tremendous number of distinct indicators can be tested all at once. In fact, some surprising correlations between the past N-day returns, for various time lags, and the future N-day returns, can be uncovered in this way using the Correlation – Filters test. In this way, a large number of possible technical indicators can be tested, one for each possible setting of the Lead Time control, not only the one indicator (the Price Projection itself) corresponding to a Lead Time of zero.
The Correlation – Indicators test works in a similar way, except that instead of testing the future Price Projection and past returns directly for correlation with future returns, more general functions of the past returns and future Price Projection can be tested. These functions of the past returns and future Price Projection are what we call technical indicators. These indicators are all functions of the past data, of course, since the Price Projections are themselves functions of the past data (no future data allowed!). Since any function of past data can constitute an acceptable technical indicator, we are safe as long as we use only the data, together with the corresponding Price Projection, in each row of the above table corresponding to n days in the past, for the value of the technical indicator n days in the past. We can then compute the correlation between the value of this indicator and the N-day future returns, in exactly the same way as described above. In addition to the time horizon and time lag or lead time adjustments, there is also a variety of smoothings of the data that can be applied, using the Savitzky-Golay smoothing filter described above. Actually, the technical indicators start with the log price data, not returns. But the results of the Correlation – Filters test described above, using returns data, can be reproduced in the Correlation – Indicators test by setting the smoothing time scale to 0 days and using the velocity type of smoothing. Other indicators can be constructed using the relative price smoothing of the prices, or the acceleration type of smoothing. The time scale of smoothing can also be adjusted over a wide range. It is also possible to take a difference between two smoothings of the same type, with different time scales, and to reverse the sign of the indicator. Together, all these settings cover a very wide range of possible technical indicators, essentially all that can be constructed as linear functions of the log prices alone. (Later, more types of indicators such as nonlinear functions of prices, volume indicators, and so forth, may be included.)
Another statistical display in QuanTek is the Correlation – Returns dialog, which is a scatter graph of the returns of two different securities. This is intended to display, graphically, the correlation or anti-correlation of the returns of the two securities. If the returns are correlated, then the dots of the scatter graph will tend to line up along the diagonal line of the graph, while if they are anti-correlated, they will tend to line up along the opposite diagonal. If the returns are uncorrelated, the display is designed so that the dots should be evenly distributed over the whole square area of the display (assuming a double exponential distribution of the returns). Connected with this display is a measure of the correlation, using three different methods. First there is the ordinary Pearson’s R method of measuring correlation, which measures linear correlation [NR (1992)]. But there are also two different nonparametric or rank-order correlation methods, which are also called robust methods of measuring the correlation. These are the Spearman Rank-Order Correlation and Kendall’s Tau [NR (1992)]. These robust methods do not depend on the random variables belonging to a Gaussian distribution (which they really do not – the distribution has “fat tails”), and hence they are less likely to indicate spurious correlation where no correlation really exists.
Also available within the Correlation – Returns dialog is another graph, which is a display of the correlation sequence between the two securities, as a function of time lag. There are actually two displays, one for positive lag and one for negative lag. If the two securities are the same, then these graphs show the autocorrelation between the returns of the security, and both the displays for positive and negative lag are the same. The time horizon may also be set for N days, where the returns are averaged over N days and their correlations computed for the N-day averages.
This Correlation – Returns dialog shows some rather strong correlations for certain pairs of stocks, especially those in the same sector of the economy. This correlation is useful to know from the standpoint of reducing risk in an optimal portfolio. If some of the securities in the portfolio are strongly correlated, then this increases the market risk because if one security loses value, all the securities that are correlated with it also tend to lose value. To reduce risk to a minimum, it is desirable to choose securities that are uncorrelated, or even anti-correlated, so that fluctuations in the value of one security will be hedged or compensated by fluctuations in the other securities in the portfolio. This correlation measurement could also be useful in a future version of QuanTek, which uses a multivariate Price Projection filter.
There are three main types of technical indicators used by the QuanTek program. These correspond to the three main types of correlation mentioned above, namely return to the mean and trend persistence, plus turning point or trend reversal. These three types of indicators are implemented by means of the Savitzky-Golay smoothing filter, using the filter directly on the price data to obtain the Relative Price indicator, taking the first derivative to obtain the Velocity indicator, or taking the second derivative to obtain the Acceleration indicator. These three types of indicators may then be used to construct three Momentum indicators for each stock or security. The sum of the three Momentum indicators, with adjustable weights, forms the Trading Rules indicator. The smoothing time scale of the indicators may be adjusted over a wide range, and the phase or time lag of each indicator may be adjusted to achieve maximum correlation with the N-day future returns. (Note: The number N is the time horizon, which is distinct from the smoothing time scale.) The correlation is maximized between the indicator and the returns over the past data, over a range of 2048 days, and if significant correlation is measured over this time interval, the presumption is that it has a good chance of persisting over at least the next 100 days or so in the future. However, we don’t know of any way to prove this persistence at present.
In the description of the indicators given below, the phase relationships between the Relative Price, Velocity, and Acceleration indicators with respect to the returns are given in a theoretical way. These descriptions would be accurate if the returns followed a pure sine wave of a single frequency, and the time horizon and smoothing time scale were set according to this frequency (the period of the wave equal to 2N days). These are the phase relationships assumed in computing the buy/sell points from the Harmonic Oscillator indicators. However, when the correlation of the Momentum indicators derived from these three indicators with the future returns is computed, it is found that these phase relationships often do not hold. The indicators have a different phase relationship to the N-day future returns than they theoretically should. This is the purpose of the lead time or time lag controls – to adjust the phase of the indicators so that the Momentum indicator is in phase with the returns. The origin of this phase discrepancy requires further study, although it probably has to do with the fact that the Fourier Transform of the security returns consists of a sum of waves of all different frequencies, not just of one frequency, and the phase relationships between these different waves can be complex and rapidly varying. A rapidly varying phase shift with frequency of the component waves could well account for the discrepancy between the theoretical and actual phase of the Momentum indicators, relative to the returns.
We define a Momentum indicator as any function of the past prices that is supposed to show a positive correlation with future returns. The value of the Momentum indicator for each day may then be interpreted as our estimate, based on past price data, of the future return for the corresponding day. To estimate the N-day future returns, we need to take an N-day forward moving average of the Momentum indicator. The value of this forward averaged Momentum indicator is then supposed to be our estimate of the future returns over the next N days. This then translates directly into Trading Rules. The N-day position should be varied according to the returns to be expected over the next N-day time horizon, so the position to be established should be positive when the N-day forward averaged Momentum indicator is positive, and should be negative when the N-day forward averaged Momentum indicator is negative. This position is, of course, supposed to be held for the duration of the N-day time horizon.
In the QuanTek splitter windows for the Momentum Indicators and the Trading Rules Indicator, what is displayed is the N-day forward moving average of the indicators and of the returns, for comparison. This means that the Momentum indicators with the N-day forward moving average, along with the N-day forward averaged returns, should be ahead of the corresponding N-day smoothed returns by approximately _{} days, or what we call one time period. So the buy point should be indicated by a peak in the forward averaged Momentum indicator, and a sell point should be indicated by a trough in this indicator. The purpose of this is so that you will buy and sell before the returns actually reach their maximum and minimum values, rather than always being too late on the buy/sell. The idea is that you buy at the buy point, then hold approximately N days until the sell point, so that the holding period itself is timed to coincide with the expected peak in the future returns over the next N day time horizon (and conversely for short sales).
The first type of indicator that might be used to form a Momentum indicator is called the Relative Price. This is a difference of the (logarithmic) price levels with smoothings on two different time scales, the shorter time scale minus the longer time scale. This is similar to the MACD oscillator consisting of the difference of two exponential MAs mentioned previously (except without the time lag). This type of indicator is a measure of the return to the mean mechanism, with the longer time period smoothed price level playing the role of the mean level. When the shorter time period smoothed price level is below the longer period one, the future prices are expected to rise, and when it is above, the future prices are expected to fall. There is a certain time delay here, which is of the order of the shorter smoothing time period, in which the trough or peak of this indicator now implies that the future returns will be positive or negative later, roughly by this time delay, which corresponds to one time period of _{} days, with N being the shorter smoothing time scale. So there is a phase difference between the Relative Price indicator and the future returns that it is supposed to predict. The negative of this indicator leads the expected future returns by approximately one time period of _{} days (or 90 degrees), so if it is lagged by one time period, it qualifies as a Momentum indicator.
Due to this phase relationship, we expect the buy points to pass through the minima [min]of the Relative Price indicator, and the sell points to pass through the maxima [max]of this indicator, for an N-day time horizon. This is, of course, nothing other than the Buy Low – Sell High mechanism at work. In the QuanTek program, a Relative Price indicator is displayed as part of the Harmonic Oscillator indicator in a splitter window. The buy/sell points that are displayed as vertical green/red lines are defined by the minima/ maxima of the Relative Price indicator (along with the rest of the Harmonic Oscillator). These buy/sell points are intended as reference markers, in the past data, for the other indicators, in particular the Momentum indicators. In fact, the Relative Price indicator should be roughly 180 degrees out of phase with the N-day forward moving average of the Momentum and Trading Rules indicators. Or, the negative of the Relative Price indicator should be in phase with the N-day forward moving average of the Momentum and Trading Rules indicators. (Note: The time horizon N is the smoothing time scale for the Harmonic Oscillator indicators, and also the time scale for the N-day forward moving average of the Momentum and Trading Rules indicators, and the forward averaged returns and volatility. On the other hand, the smoothing time scale setting in the definition of the Momentum indicators is a separate setting.)
The second type of indicator that might be used to form a Momentum indicator is called the Velocity. It is the smoothed first derivative of the log prices. (This is the kind of indicator, which is normally called a Momentum indicator, but we reserve the term Momentum indicator for any indicator that is supposed to be positively correlated with returns, not just the Velocity.) This indicator is a measure of trend persistence. It is clear that if the trend is persistent, then the smoothed Velocity of the log prices should be correlated with the returns. So the Velocity indicator is in phase with the returns.
The buy points will thus correspond to the points where the returns start to become positive, in other words the zero-crossing points from negative to positive [Z+]. Likewise, the sell points will correspond to the points where the returns start to become negative, which are the zero-crossing points from positive to negative [Z–]. Since the negative of the Relative Price indicator leads the returns, the trough of the Relative Price [min] corresponds to the positive zero-crossing point [Z+] of the Velocity indicator, which is a buy point. Likewise, the peak of the Relative Price [max] corresponds to the negative zero-crossing point [Z–] of the Velocity indicator, which is a sell point. So the green/red vertical lines denoting buy/sell points pass through these points of the Relative Price and Velocity indicators. The Velocity indicator, since it is supposed to be correlated with returns, should be in phase with the Momentum indicators. Hence the Velocity indicator will lag by approximately one time period (_{} days, or 90 degrees) the N-day forward moving average of the Momentum and Trading Rules indicators.
We may also construct a third type of technical indicator that might be used to form a Momentum indicator, which is called the Acceleration. It is the smoothed second derivative of the log prices. This indicator may be interpreted as an indicator of turning points, because the second derivative is positive when the prices are at a minimum (positive or upward curvature) and is negative when they are at a maximum (negative or downward curvature). Hence it can be seen that this Acceleration indicator will be positive when the Relative Price is negative, and vice-versa. Thus the Acceleration indicator is exactly out of phase with the Relative Price. However, the Acceleration differs from the Relative Price in that, with each successive derivative, the high frequency components are emphasized more and more. Hence the Acceleration indicator contains much more of the high-frequency components than the Relative Price and hence is much less smooth. The Acceleration, since it indicates a turning point, should lead the returns by approximately one time period (_{} days or 90 degrees). A positive Acceleration peak, indicating a turning point from negative to positive returns, should be followed approximately _{} days later by a positive peak in the returns, and likewise a negative Acceleration trough, indicating a turning point from positive to negative returns, should be followed approximately _{} days later by a negative trough in the returns. Hence the buy points will correspond to positive peaks [max] in the Acceleration, and the sell points will correspond to negative troughs [min]. So the Acceleration is 180 degrees out of phase with the Relative Price, and leads the Velocity by 90 degrees or one time period. The green and red vertical lines denoting the buy/sell points indeed pass, approximately, through the peaks/troughs of the Acceleration indicator. Also it can be seen that the Acceleration indicator should be in phase with the N-day forward moving average Momentum and Trading Rules indicators.
It is implicit in the above definitions that the Momentum indicators reach a positive peak when the expected return is maximum. In other words, the Momentum indicators are supposed to be surrogates for the expected returns at each point in time. The optimum trading rules are to be long when the returns are positive, and short when the returns are negative. Thus, the optimum buy point is just when the Momentum indicators are crossing zero from negative to positive [Z+], and the optimum sell point is when the Momentum indicators are crossing zero from positive to negative [Z–]. More precisely, an optimal strategy would be to vary the position daily so that the position is proportional to the value of the Momentum indicator.
However, for N-day trading, presumably purchases and sales are only made at N-day buy/sell points, which are separated by roughly N days. At these points, the Momentum indicators are generally crossing the zero line as described above. But for an N-day trading strategy, a purchase would be made at the buy point, in proportion to the N-day expected return, according to the N-day future moving average of the Momentum indicator. So the quantity of interest is this N-day future moving average of the Momentum indicator, at the buy/sell points. Then this is supposed to be correlated with the N-day future moving average of the future returns. Accordingly, this N-day future moving average of the Momentum indicators and Trading Rules indicator is displayed in the QuanTek splitter windows. The peaks of this forward moving average Trading Rules indicator are expected to coincide approximately with the buy points, and the troughs with the sell points. This is because the N-day future moving average indicators are roughly ahead by N/2 days from the corresponding un-averaged Momentum and Trading Rules indicators (which are not displayed in the splitter windows). Since the N-day future returns are an N-day forward moving average of the returns, they should lead the returns by N/2 days. So the buy point should be (approximately) N/2 days past the upward zero crossing (Z+) of the N-day future returns and forward averaged Trading Rules indicator, and likewise the sell point should be (approximately) N/2 days past the downward zero crossing (Z–) of the N-day future returns and forward averaged Trading Rules indicator. So a buy/sell point for short-term trading should be indicated when the N-day forward averaged Trading Rules indicator is near its positive/negative peak (max or min, respectively). Note that the Trading Rules indicator is normalized so that the absolute maximum and minimum values are roughly +158% and –158%, respectively. This percentage denotes the relative proportion of the short-term trading equity allocated to this security.
One of the most basic examples of an oscillator that (reputedly) forms a Momentum indicator is the ordinary MACD – Moving Average Convergence Divergence [Pring (1991)], or difference of two MA’s of (logarithmic) prices. Moving averages (let us say exponential ones) are used as technical indicators by superposing two MAs with different time scales of averaging. When the shorter time scale MA crosses the longer time scale MA moving upward, this is taken as a buy signal, and when it crosses moving downward, this is a sell signal. Hence, forming an oscillator consisting of the difference of the two MAs, the buy/sell points are marked by the zero crossings upward/downward, respectively. If there were no time lag with moving averages, then the oscillator so formed would be classified as a Relative Price indicator, because it is the difference of two smoothed prices, and the buy points would be the minima of this indicator, and the sell points the maxima. However, there is a time lag of roughly one smoothing time unit, where the smoothing time scale N is two such units, and the dominant cycle hence has a period of roughly four time units. Thus there is a time lag of roughly 90 degrees, or one-quarter cycle due to the moving average. Since the Relative Price indicator already lags by one time unit the Velocity indicator (with acausal smoothing), this means that the MACD will lag the Velocity by two time units. Hence it should be out of phase with the Velocity indicator (and hence the Momentum indicator) by 180 degrees. This means that, for trading on cycles of four time units period, the MACD is exactly out of phase with the correct trading signals. As a check, we wish to buy at price minima and sell at price maxima. The MACD corresponds to these minima and maxima, except delayed by approximately one time period due to the moving average. Due to the delay of one quarter cycle, the downward zero crossing of the MACD (which is ahead of the price minimum) will be lined up with the actual price minimum, implying a buy point, and the upward zero crossing of the MACD (which is ahead of the price maximum) will be lined up with the actual price maximum, implying a sell point. But this is exactly opposite to the trading signals that we are supposed to use! Evidently, the traditional use of the MACD is confined to long-term trends that are much longer than the shorter smoothing time period of the MA’s, with the presumption that the turning point takes place on a time scale much shorter than the trend itself, so that this indicator will be approximately in phase for these long-term cycles. But this illustrates that perhaps traditional Technical Analysis has been too cavalier about preserving the correct phase relationships between technical indicators and actual prices moves!
The standard definition of linear correlation of two random variables, called Pearson’s R, is given by [NR (1992)]:
_{}
Here, _{} and _{} are the mean values of the two random variables. There are two other types of correlation, which are called robust correlations, which are the Spearman Rank-Order and Kendall’s Tau. These are called nonparametric methods of computing correlation because, unlike the linear or Pearson’s R correlation, they do not presuppose a Gaussian distribution of the random variables. The Spearman Rank-Order correlation is the linear correlation of the ranks, as opposed to the linear correlation of the values of the variables as in linear correlation. To compute the ranks, the values are arranged in increasing order, and the order of each value is its rank. Kendall’s Tau uses the correlation of the numerical order of the ranks (greater than, less than, or the same), as opposed to the difference in value of the ranks as in Spearman Rank-Order. These two robust methods are more reliable when the distribution of the random variables is non-Gaussian, and in particular when the distribution has “fat tails” as is the case with most financial data.
However, for our purposes a modified definition of correlation is more suitable. The problem with the above definition is that it breaks down when the buy-and-hold strategy is considered. To be specific, one of the above random variables will represent the future returns, and the other will represent the trading rules, or amount to be invested in a short-term trading strategy. If s is the number of shares, and _{} is the actual (not logarithmic) returns (change in price per share), then the expected (simple) gain g, in dollars, is given by (summed over the trading days in a given time interval):
_{}
For y in the correlation formula we may use the logarithmic returns as a conservative estimate for the actual returns:
_{}
The amount invested, in dollars, at time i is given by:
_{}
Thus we have:
_{}
For the annualized simple gain we sum over the number of trading days in a year, assuming we are dealing with daily returns, which may be taken to be 256 days.
The trading rules variable x is defined as the dollar amount invested at any given time, relative to the average amount of equity invested over the time period. This average equity can be either long or short, so the average equity invested is given by the average absolute value of the dollar amount invested over the time interval:
_{}
Here we define the average absolute value of the dollar amount invested over the time interval by:
_{}
The average absolute value of the equity invested, as a percentage of the total equity available to invest, is called the average margin leverage. To compare measured correlations to measured returns from trading rules, we normalize the average margin leverage to 100%. In other words, the normalized gain, denoted _{}, will be given by the annualized simple gain divided by the average absolute value of the dollar amount invested.
However, the correlation is expressed in terms of the root mean square of the trading rules, not the average absolute value of the trading rules (which is defined to be unity). We need to convert between one and the other. This is straightforward if we assume the random variables are distributed according to a Gaussian distribution. Denoting a Gaussian random variable by z, with standard deviation s, it is well known that the Gaussian distribution (assuming _{}) is normalized as follows:
_{}
The r.m.s. value of z is then given as the square root of the mean value of _{}, which (the latter) is defined to be the variance:
_{}
The average absolute value of z, on the other hand, is given as follows:
_{}
Thus we have the following general relationship between the average absolute value of a Gaussian variable and its standard deviation (root mean square value):
_{}
Thus when any quantity is normalized to unit average absolute deviation (dividing by the average absolute deviation), it will be about 25% greater than when it is normalized to unit standard deviation (dividing by the standard deviation).
Thus the annualized gain, normalized to unit margin (average absolute amount of dollars invested) will be given by:
_{}
This may be rewritten using the definition of the gain given above (renormalizing d_{i} in numerator and denominator by dividing by the average absolute deviation):
_{}
We may now use the inequality given above to rewrite this in terms of the logarithmic returns y_{i}:
_{}
Taking into account that there are 256 trading days in a year, we find:
_{}
Let us denote the average volatility, by which we mean the r.m.s. value of the logarithmic returns, by s:
_{}
We may then define our modified correlation, as follows:
_{}
In other words, the modified correlation is the regular correlation with the mean values of the variables not subtracted off.
The annualized gain, normalized to unit margin, is the expected dollar gain divided by the average absolute amount of dollars invested. It is thus given in terms of the quantities defined above by:
_{}
Thus the expected annualized simple gain, normalized to unit margin (unit average absolute amount of equity invested) is approximately given by the modified correlation multiplied by the average (r.m.s.) daily volatility of returns, times the number of trading days in a year and a numerical factor.
Thus we see that the meaningful quantity for the estimation of trading returns is this modified correlation, computed as if the mean values of the variables were zero, rather than the standard definition of correlation. In the ideal case of daily returns that are constant, the trading rules would be simply a constant amount invested, and then the modified correlation between the trading rules and the returns would be 100%. On the other hand, according to the usual definition of correlation, the correlation would be indeterminate because the variance of both the trading rules and returns would be zero; both of these would be equal to their mean values, so there would be zero in both the numerator and denominator. If, as often happens, the trading rules are nearly constant, then there would be very small quantities in both the numerator and denominator, and the computed correlation would be dependent on minute variations in the trading rules, which has very little to do with actual investment gains or losses. The modified correlation, on the other hand, would register the gain or loss to be incurred from the nearly constant investment, so it is the appropriate measure of correlation to be employed here.
The usual routines for measuring correlation [NR (1992)] use the data with the means subtracted off, so these routines must be modified to eliminate this subtraction of the means, resulting in the formula for the modified correlation given above. The theoretical return is then computed as above, multiplying this modified correlation by the r.m.s. (logarithmic) volatility, times the number of trading days in a year and a numerical factor, which results in a number which is approximately the actual gain, for small values of the daily returns, and is always less than or equal to the actual gain (so it is a conservative estimate).
Peter J. Brockwell & Richard A. Davis, Time Series: Theory and Methods, 2^{nd} ed.
Springer-Verlag, New York (1991)
John Y. Campbell, Andrew W. Lo, & A. Craig MacKinlay (CLM),
The Econometrics of Financial Markets,
Princeton University Press, Princeton, NJ (1997)
Robert D. Edwards & John Magee, Technical Analysis of Stock Trends, 6^{th} ed.
John Magee Inc., Boston, Mass. (1992)
Andrew C. Harvey, Forecasting, Structural Time Series Models and the Kalman Filter
Cambridge University Press, Cambridge, UK (1989)
Simon Haykin, Adaptive Filter Theory, 4^{th} ed.
Prentice Hall, Upper Saddle River, NJ (2002)
Sheldon Natenberg, Option Volatility & Pricing,
McGraw-Hill, Inc., New York, NY (1994)
Donald B. Percival & Andrew T. Walden,
Wavelet Methods for Time Series Analysis
Cambridge University Press, Cambridge, UK (2000)
Edgar E. Peters, Chaos and Order in the Capital Markets,
John Wiley & Sons, Inc., New York, NY (1991)
Edgar E. Peters, Fractal Market Analysis,
John Wiley & Sons, Inc., New York, NY (1994)
William H. Press, Saul A. Teukolsky, William T. Vetterling, & Brian P. Flannery (NR),
Numerical Recipes in C, The Art of Scientific Computing, 2^{nd} ed.
Cambridge University Press, Cambridge, UK (1992)
Martin J. Pring, Technical Analysis Explained, 3^{rd} ed.,
McGraw-Hill, Inc., New York, NY (1991)
Tonis Vaga, Profiting From Chaos
McGraw-Hill, Inc., New York, NY (1994)