PSTAT174 – Time Series Analysis of $SPY (Solution)

Description

By Alex Wako (awako@ucsb.edu)
Abstract
The $SPY ETF or the Standard and Poor’s 500 ETF is described by State Street Global Advisor as an ETF composed of “selected stocks from five hundred (500) issuers, all of which are listed on national stock exchanges and spans over approximately 24 separate industry groups.” The price of $SPY corresponds with the price and yield of the S&P 500 Index. It is a popularly traded stock in the market, consistently traded daily with high volume.
The goal of the project is to statistically analyze the price of $SPY using time series models and possibly forecast future prices. The methods used in this project are a Seasonal ARIMA model and GARCH model. The results of the analysis found from each model showed that the high volatility and the influence of unquantifiable variables within the stock market made it difficult to create a model with a good fit.
Introduction
The business of a stock trader is a constant use of modeling to understand the direction the stock will move towards. Time series modeling is a tool used by many large institutions to gain an advantage over other groups. Having an advantage means having a stronger understanding of the market and partial control of the stock; therefore, for the long run, it would result in more profits. The goal of the project is to understand how complicated a model must be to benefit the trader, discover the effectiveness of simpler models such as SARIMA, and possibly predict future prices of $SPY.
This project takes a look at the historical weekly closing price of $SPY and applies statistical analysis using time series models with the R programming language. Libraries used in R code for the project are quantmod for pulling $SPY data out of Yahoo Finance and astsa, xts, and fGARCH for time series analysis. These tools are open for anybody to use, so learning if they are effective in predicting future stock prices can make almost anybody an elite trader. The truth is the stock market and programming existed for a while, and the tools have always been open to use, so anything more than suboptimal is an unexpected result. Nevertheless, the main purpose of this project is to understand why a simple model in R does not work with volatile stock data and figure out what factors a model has to consider when analyzing the stock market.
Data
Methodology
SARIMA – The SARIMA model or the Seasonal Autoregressive
Integrated Moving Average is a time series model that extends the ARIMA model to include seasonality. The model takes in all components of the ARIMA model and adds a seasonal trend. The different parts of SARIMA are p, d, q, P, D, and Q, where the upper case variables represent the seasonal component. p and P or the Autoregressive component of the model understands the relationship between the effect of past values with the current value. The value of p and P depends on the number of lag that correlates with the current value. d and D or the Integrated component of the model is the difference to remove any trend and/or seasonality within the data. q and Q or the Moving Average component of the model understands the residuals of past values to create a linear regression for the current value. For the case of this project, the Akaike Information Criterion is used to determine the best model for the data.
GARCH – The GARCH model or Generalized Autoregressive
Conditional Heteroscedasticity is applied to a data on the assumption that the variance of is not constant. The model takes an ARMA component and the variance to describe the data. GARCH models are derived from ARCH models which add on the lagged conditional variance to the model. The GARCH model is commonly used in
financial data sets as it attempts to understand the volatility present in the stock market and apply its effect to future values.
Result
SARIMA Model
The original plot of the data showed an inconsistent trend that increased the price of $SPY as time passed.

The original plot shows that there was no visible seasonality in the data. By transforming the data by using log differencing, the data can reveal possible seasonality or new trends.

After log differencing and plotting the ACF and PACF of the transformed data, the data shows stationarity with AR1 and MA2. Seasonality does not show in the plot, so seasonality is ignored during the process of choosing the best model.
Multiple SARIMA models are then fit into the data, starting with
SARIMA(1, 0, 0) x (0, 0, 0)52 and SARIMA(0, 0, 2) x (0, 0, 0)52. Using the AIC, the best model is found to be SARIMA(2, 0, 2) x
(0, 0, 0)52. The diagnostics show that the model is stationary and all the predictors of the model are statistically significant. The Q-Q plot does not prove normality as the plot shows that the data is left skewed and the Ljung-Box
statistic does not reject the null hypothesis for all points of time, but this shows that the SARIMA model does not work well with the $SPY data and another model approach had to be applied to return better results.
The forecast proves the statement as the SARIMA model predictions are less than ideal for achieving an advantage over other traders.

GARCH Model
As shown in the SARIMA model, the data has heteroskedasticity, meaning the GARCH model can be used.
Using results from the SARIMA model, a ARMA component of AR(1,0) and MA(0, 2) is considered when constructing the GARCH model.

The diagnostics of the two ARMA models also show heteroskedasticity as normality is not shown in the residuals, Q-Q plot, and Ljung-Box test.
Multiple GARCH models are then fit into the data, using either
GARCH(1, 1) or GARCH(1, 0) with the AR or MA components found earlier. Using BIC as the main criterion proposed by Javed(2011), the AR(1)-GARCH(1, 1) returned with the best result. The summary of the model
returned with all variables being statistically significant,
and the model passing all the tests used.
As a result of the prediction of 12 weeks ahead, the plot returned as a horizontal linear line. Same as
the SARIMA model, the results are less than ideal for predicting future prices of $SPY, but this shows that one time series model is not enough to solve something as complicated as the price of $SPY.

5. Conclusion
The results of the time series analysis do not show significant information toward predicting the future price of $SPY. In fact, the results show that volatility and outside factors are difficult to predict with just one model.
Additionally, weekly closing prices can bring difficulty in understanding the in-depth structure of the stock market and $SPY. $SPY is an index consisting of multiple stocks selected by a group of people. The stocks making up $SPY change quarterly, and the percentage holding of each stock can change daily. If one of the higher holding stocks were to crash or pump and another holding stock was to do the opposite reaction and recover the price of $SPY, the weekly $SPY data will never know about the change and never understand why a stock ended at a said price at the end of the week.
The results of the project showed that a single model could not predict the price of a stock as complicated as $SPY. A more sophisticated model that can understand the effect of the 500 holdings stocks relative to $SPY will more accurately predict the future prices of $SPY. Volatility is also difficult to understand with one time series model and is likely the biggest factor in the outcome of the project.
References
Shumway, R. H., & Stoffer, D. S. (2017). In Time series analysis and its applications: With R examples. essay, Springer.
library(quantmod) library(astsa) library(xts)
getSymbols(“SPY”, from = “1993-01-31”)
SPY <- to.weekly(SPY$SPY.Adjusted) tsSPY1 <- SPY$`SPY$SPY.Adjusted.Close`
# Plotting data
plot(tsSPY1, ylab = “Closing price of SPY”, main = plot(tsSPY1, ylab = “Closing price of SPY”, main =
# Transformation dlSPY1 = diff(log(tsSPY1))[-1]
# Spotting trends
lag1.plot(dlSPY1, 20) acf2(dlSPY1, 60)
try(decompose(dlSPY1)) # Returns an error that the time series has no or less than 2 periods
# A seasonal trend does not seem to exist and the ACF and PACF show to be stationary # The ACF and PACF show the model is likely has AR(1) and/or MA(2) for the first set of data sarima(dlSPY1, 1, 0, 0, 0, 0, 0, 52) sarima(dlSPY1, 0, 0, 2, 0, 0, 0, 52) sarima(dlSPY1, 2, 0, 0, 0, 0, 0, 52) sarima(dlSPY1, 2, 0, 2, 0, 0, 0, 52) sarima(dlSPY1, 0, 0, 1, 0, 0, 0, 52) sarima(dlSPY1, 1, 1, 0, 0, 0, 0, 52)
# The Seasonal ARIMA (2, 0, 2) x (0, 0, 0) shows the best results
# Forecasting 12 weeks ahead
forecast <- sarima.for(dlSPY1, 12, 2, 0, 2, 0, 0, 0, 52) library(fGarch)
# Heteroskedasticity is seen in the time series data so a GARCH model is deployed
GARCH1 <- sarima(dlSPY1, 1, 0, 0) GARCH2 <- sarima(dlSPY1, 0, 0, 2)
summary(m1AR <- garchFit(~arma(1, 0)+garch(1, 1), dlSPY1)) summary(m2AR <- garchFit(~arma(1, 0)+garch(1, 0), dlSPY1)) summary(m1MA <- garchFit(~arma(0, 2)+garch(1, 1), dlSPY1)) summary(m2MA <- garchFit(~arma(0, 2)+garch(1, 0), dlSPY1)) summary(m1ARMA <- garchFit(~arma(1, 2)+garch(1, 1), dlSPY1)) summary(m2ARMA <- garchFit(~arma(1, 2)+garch(1, 0), dlSPY1))
“SPY Weekly Closing Price from Feburary 1993 to Prese
“SPY Weekly Closing Price from Feburary 1993 to Prese
)
42
summary(m1 <- garchFit(~garch(1, 1), dlSPY1)) summary(m2 <- garchFit(~garch(1, 0), dlSPY1)) predict(m1AR, 12, plot = TRUE)
43

Reviews

There are no reviews yet.

Be the first to review “PSTAT174 – Time Series Analysis of $SPY (Solution)”

PSTAT174 – Time Series Analysis of $SPY (Solution)

Description

Reviews

Related products

PSTAT174 – Homework 1 (Solution)

PSTAT174 – Homework 3 (Solution)

PSTAT174 – Homework 2 (Solution)

PSTAT 174/274 COURSE PROJECT (Solution)