# III Statistical models

## DVAR model

The variability of stock prices is fascinating to both academic researchers and investment practitioners. The problem of how to describe the variability of stock price is of great interest, and many time series modeling techniques have been proposed.

The first and the most important modeling technique for univariate time series is the ARMA model. An ARMA model of order (p, q), ARMA(p, q) is given by:

where c is a constant term, r, is the stock return, ef is a white noise, a, and /?, are parameters.

The ARMA model is totally data driven and capable of capturing the linear dependence among rt. However, this model can not explain the volatility clustering effect, a well documented fact in empirical finance. To simultaneously capture the linear dependence and the volatility clustering effect in returns, the following ARMA-GARCH (p, q) model can be used:

where z, is the noise term of standard normal distribution, b, is the stock return variance. To ensure positivity and stationarity of return variance, it is required that

The intertemporal capital asset pricing model (ICAPM) of Merton (1973) suggests that the conditional expected excess return on the stock market should vary positively with the market conditional variance. To capture the risk-return tradeoff, the ARMA-GARCH model can be further generalized to the following GARCH-in-Mean model

where у is the coefficient of relative risk aversion, reflecting the risk-return tradeoff.

The above mentioned modeling techniques are all based on closing price, giving no attention to other available price information such as the high and the low price extremes.1 Recent academic literature shows that high and low prices have significant effect on stock returns due to limited attention (Hirshleifer and Teoh, 2003; Hirshleifer et al., 2011). George and Hwang (2004) found that nearness to the 52-week high dominates and improves upon the forecasting power of past returns (both individual and industry returns) for future returns. Huddart et al. (2009) found that past price extremes (around a stock’s 52- week highs and lows) influence investors’ trading decisions. Li and Yu (2012) found that nearness to the 52-week high positively predicts future aggregate- market returns, while nearness to the historical high negatively predicts future market returns. Xie and Wang (2018) found high and low prices are informative for return forecasting.

This chapter shows how price extremes can be used for stock return modeling using the range decomposition technique presented in Chapter 4. This chapter is organized as follows. Section 1 proposes a decomposition-based vector autoregressive (DVAR) model for return modeling. Section 2 presents the statistical foundations of the DVAR model. Sections 3—4 scrutinize the results through simulations and empirical studies. We summarize in Section 5.

5.1 The model

In Chapter 4, we demonstrated that a closing price can be approximated by the following equation,

where C„ I'll, and PRt are, respectively, the closing price, the technical range and the Parkinson range.

Denoting the error resulted from this approximation as IT,, we can rewrite the log closing price as follows

where

and

Denoting ln( TR,)-ln( I’R,) as Rt, the log closing price can be rewritten as

Taking differential operation on both sides of Eq. (5.6), we obtain

where Д is the differential operator. Eq. (5.7) indicates that stock return can be decomposed into a couple of components. Therefore modeling stock return is equivalent to a modeling of ARr and Д Wt.

Instead of modeling Д R, and Д W, one by one, we suggest AR, and AW, are treated as a system, and propose to use a vector autoregressive (VAR) model for a simultaneous modeling of AR, and Д W,.

The VAR model is popularized by Sims (1980) as an atheoretical forecasting technique. This technique is underpinned by statistical methodology and not subject to contemporary macroeconomic theory. A VAR model of order />, VAR(/>) is given by

where y, is a k x 1 vector and £, is а к x 1 vector of error terms. Here in this book y, = (AR,, AW,)1. As this VAR model is based on price decomposition, we thus call this model the decomposition-based vector autoregressive model, or DVAR for short.

5.2 Statistical foundation

In the absence of prior information, the VAR assumes that every series interacts linearly with both its own past values as well as those of evert' other included series. In this section, we will show, both theoretically and empirically, that AR, and Д IT, interact linearly with each other. The tool we use to detect the linear interaction is the Granger causality test.

Before presenting the proof of bidirectional Granger causality between AR, and Д W„ some symbols need to be clarified first. The symbols used in this section are

О!’1: Information set released from H, to L,;

Information set released from C, to L,+ i;

: Information set released from Hu i to Ct+1.

According to neoclassical finance, asset price changes are due to new information being reflected. Therefore, price changes are functions of information sets. To be

where ln(TRt) and ln(PRr) are specified with different functions for the consideration that these two indicators are assumed to manufacture information sets in different ways although TR, and PR, have the same information set. To make the above notations easy to understand, a random walk simulation is presented. The simulation is produced as follows:

Step 1: An i.i.d sample of size 2000 from normal distribution N(0, 0.1) is generated;

Step 2: The first 1000 sample is used as the trading process on day t, and the remaining sample is used as the trading process on day t+1;

Step 3: Each day the largest data observation is used as the high price and the lowest data observation is used as the low price.

Figure 5.1 presents the simulation results.

It is the argument of neoclassical financial economics that the price movement is totally due to new information being reflected. According to Eq. (5.9), AR, and Д IT, can be represented by the following functions.

Figure 5.1 Information sets and price changes

Figure 5.2 Granger causality: an illustration

and

Before proceeding to the rigorous mathematical statement, we present an intuitive explanation to the bi-directional Granger causality between ARHl and Д W,+ i. The argument that Д W, Granger causes ARr,i is manifested through Equations (5.12) and (5.15) since Д W, and ДЛ,+ | contain the same information set Cl1'1,2 The same reasoning is also applicable to explaining why ДR, Granger causes Д W„1.

The intuitive explanation presented above shows some hints on how to prove the bidirectional Granger causality between ДR, and AW, in mathematical sense. The proof consists of three stages:

Stage 1: AR, and AW, contribute to forecasting AR„ i;

Stage 2: AR, and Д IT, are not independent;

Stage 3: AR, and Д IT, are not linear correlated.

The proof of Stages 1-2 is obvious given that AR,t AR, and Д IT, have the same information set Q.11. An indirect proof is given to Stage 3. Suppose that AR, is perfectly linearly correlated with AW„ then the following equity

32 Statistical models establishes

Substituting p(£lhtil,Q.‘;b) and h(Q.ctQ.‘tClhtc) - p(Qhti1,Q‘;b) for All, and ДW, respectively yields

Equation (5.17) indicates perfectly linear correlation between p(Q.11 and h(Q.‘t ,fi, ), which seems impossible given that these two random variables are determined by different information sets.

In a similar way, it can be proved that All, Granger causes Д W,+1.

The relationships among ARUь Д Wf+i, All, and Д W, can be seen clearly in Figure 5.2. iiui denotes the residual term which can not be explained by All, and AW,.

5.3 Simulations

Section 2 has demonstrated that the bi-directional Granger causality test is due to information overlapping, regardless of the data generating proeess of the stock price. To confirm the demonstration, this section resorts to simulations. The simulating process proceeds as follows:

Step 1: An i.i.d sample of size 100000 is generated from normal distribution, N(0, 0.01);

Step 2: With these 100000 sample data, a geometric Brownian motion without drift term is obtained;

Step 3: Divide these geometric Brownian motion points into 1000 groups, thus each group includes 100 points. Within each group the last data is used as the close price, the maximum price as the highest price and the minimum as the lowest price;

Step 4: From Step 3, the All, and AWt are calculated.

Table 5.1 reports Granger causality test results between All, and AW,. The null hypothesis is there is no Granger causality' from AR, (AW,) to AW, (All,). Since the Granger causality test is very sensitive to the lags, different lags are selected to consolidate the results. Feige and Pearce (1979), Christiano and Ljungqvist (1988) and Stock and Watson (1989) study the sensitivity' of Granger causality'

Table 5.1 Granger causality tests between AR, and Д W,: simulation results

 Lags 2 4 6 Granger causality F-Statistic F-Statistic F-Statistic AW, doesn’t cause All, 332.841*** 187.013*** 134.598*** AR, doesn’t cause Д IT, 125.390*** 44.387*** 23.134***

Note: We use *** to mean significance at the level of 1%.

Table 5.2 Granger causality tests between AR, and AW,-, empirical results

 Lags 2 4 6 Granger Causality F-Statistic F-Statistic F-Statistic S&P500: Д W, doesn’t cause All, 188.607*** 99.070*** 70.318*** S&P500: All, doesn’t cause AW, 88.234*** 26.650*** 13.805*** FI00: AW, doesn’t cause All, 69.399*** 36.840*** 25.854*** FI00: AR, doesn’t cause AW, 35.266*** 12.139*** 6.757*** NK225: AW, doesn’t cause A11, 85.549*** 43.435*** 31.767*** NK225: All, doesn’t cause AW, 32.994*** 9.093*** 4.044***

Note: We use *** to mean significance at the level of 1%.

to lags selection. The F-statistics indicate significant evidence of bi-directional Granger causality between All, and Д Wt.

5.4 Empirical results

The real data generating process is more complicated than the simulations. Hence, empirical studies are needed to further scrutinize the results obtained in Section 3.

To perform empirical studies, the monthly index data of different stock markets are collected. We collected the Standard and Poors 500 (S&P500) in the U.S. for the sample period from January, 1950 to December, 2008 with 708 observations, the FTSE100 (F100) in Great Britain for the sample period from April, 1982 to December, 2008 with 297 observations, and the Japanese NIKKEI225 stock index (NK225) for the sample period from January, 1984 to December, 2008 with 300 observations. For each month, four pieces of price information, opening, high, low and closing, are reported. The data set is downloaded from the finance subdirectory of the website www.finance. yahoo.com.

Table 5.2 reports the Granger causality tests results. Consistent with both simulation results and the theoretical ones, the F-statistics show that the null hypothesis of no Granger causality between All, and ДIV, is rejected. Also we find the results are quite robust to the lags.

5.5 Summary

Traditionally, stock return modeling is closing price-based. This modeling technique, though simple in application, fails to incorporate the other price information, such as the high and low price extremes.

Based on the range decomposition-based technique, this chapter proposes the DVAR model for return modeling. The DVAR model makes full use of the high, low and close prices, and thus is more efficient in information employment compared with the classic return modeling technique. The DVAR model provides a new framework for return modeling and forecasting.

The statistical foundations of the DVAR model are also presented in this chapter. We use both theoretical explanations, simulations, and empirical evidence to confirm the statistical foundations.

Notes

• 1 High-low price extremes have been widely used in financial econometrics. Parkinson (1980) proposed the high-low price range as a volatility estimator. Instead of using two points data, Garman and Klass (1980) further extended the range estimator by incorporating the high, low, opening and closing prices into a volatility estimator. Rogers and Satehell (1991) and Rogers et al. (1994) proposed an alternative estimator which is drift-independent. Other references on range include Beckers (1983), Wiggins (1991), Kunitomo (1992), and more recently Yang and Zhang (2000). Corwin and Schultz (2012) used high-low range to estimate bid-ask spreads. Another noticeable effort in using high-low price information to model financial markets goes to Han et al. (2008), who suggested “interval” time series, instead of “point”, to describe financial markets.
• 2 Information sets Q11 and contain the same information since they both denote the information released over the time spanning H, and L,.

 Related topics