PSTAT126 – Homework 1 (Solution)

Description

Alex Wako
1. The dataset trees contains measurements of Girth (tree diameter) in inches, Height in feet, and Volume of timber (in cubic feet) of a sample of 31 felled black cherry trees. The following commands can be used to read the data into R.
# the data set “trees” is contained in the R package “datasets”
require(datasets) head(trees)
## Girth Height Volume
## 1 8.3 70 10.3 ## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
(a) (1pt) Briefly describe the data set trees, i.e., how many observations (rows) and how many variables (columns) are there in the data set? What are the variable names?
## [1] 31 3
The trees data set has 31 rows and 3 variables. The names of the variables are Girth, Height, and Volume.
(b) (2pts) Use the pairs function to construct a scatter plot matrix of the logarithms of Girth, Height and Volume.

(c) (2pts) Use the cor function to determine the correlation matrix for the three (logged) variables.
## Girth Height Volume
## Girth 1.0000 0.5193 0.9671 ## Height 0.5193 1.0000 0.5982 ## Volume 0.9671 0.5982 1.0000
(d) (2pts) Are there missing values?
## [1] 0
There are no missing values.
(e) (2pts) Use the lm function in R to fit the multiple regression model:
log(V olumei) = β0 + β1 log(Girthi) + β2 log(Heighti) + ϵi
and print out the summary of the model fit.
##
## Call:
## lm(formula = log(y) ~ log(x1) + log(x2))
##
## Residuals:
## Min 1Q Median 3Q Max ## -0.16856 -0.04849 0.00243 0.06364 0.12922
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.632 0.800 -8.29 5.1e-09 *** ## log(x1) 1.983 0.075 26.43 < 2e-16 ***
## log(x2) 1.117 0.204 5.46 7.8e-06 ***
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 0.0814 on 28 degrees of freedom
## Multiple R-squared: 0.978, Adjusted R-squared: 0.976
## F-statistic: 613 on 2 and 28 DF, p-value: <2e-16
(f) (3pts) Create the design matrix (i.e., the matrix of predictor variables), X, for the model in (e), and verify that the least squares coefficient estimates in the summary output are given by the least squares formula: βˆ = (XTX)−1XTy.
## (Intercept) log(x1) log(x2)
## 1 1 2.116 4.248 ## 2 1 2.152 4.174 ## 3 1 2.175 4.143 ## 4 1 2.351 4.277 ## 5 1 2.370 4.394 ## 6 1 2.380 4.419 ## 7 1 2.398 4.190 ## 8 1 2.398 4.317 ## 9 1 2.407 4.382 ## 10 1 2.416 4.317 ## 11 1 2.425 4.369 ## 12 1 2.434 4.331 ## 13 1 2.434 4.331 ## 14 1 2.460 4.234 ## 15 1 2.485 4.317 ## 16 1 2.557 4.304 ## 17 1 2.557 4.443 ## 18 1 2.588 4.454 ## 19 1 2.617 4.263 ## 20 1 2.625 4.159 ## 21 1 2.639 4.357 ## 22 1 2.653 4.382 ## 23 1 2.674 4.304 ## 24 1 2.773 4.277 ## 25 1 2.791 4.344 ## 26 1 2.851 4.394 ## 27 1 2.862 4.407 ## 28 1 2.885 4.382 ## 29 1 2.890 4.382 ## 30 1 2.890 4.382
## 31 1 3.025 4.466
## attr(,”assign”)
## [1] 0 1 2
## [,1] ## [1,] -6.632 ## [2,] 1.983 ## [3,] 1.117
The least squares coefficient given in the summary output matches the least squares coefficient found through βˆ = (XTX)−1XTy.
(g) (3pts) Compute the predicted response values from the fitted regression model, the residuals, and an estimate of the error variance V ar(ϵ) = σ2.
Predicted response values: 2.3103, 2.2979, 2.3085, 2.8079, 2.9769, 3.0226, 2.8029, 2.9457, 3.0358, 2.9815,
3.0571, 3.0313, 3.0313, 2.9749, 3.1182, 3.2466, 3.4015, 3.4751, 3.3197, 3.2182, 3.4677, 3.5241, 3.4785, 3.643, 3.7549, 3.9295, 3.966, 3.9832, 3.9942, 3.9942, 4.3554
The residuals: 0.0219, 0.0343, 0.0138, -0.0106, -0.043, -0.042, -0.0557, -0.0443, 0.0822, 0.0093, 0.1292, 0.0132,
0.032, 0.0838, -0.1686, -0.1465, 0.119, -0.1645, -0.0732, -0.0033, 0.0733, -0.0678, 0.1134, 0.0024, -0.003, 0.0851,
0.054, 0.0824, -0.0527, -0.0624, -0.0116
Estimate of error variance: 0.0066
2. Consider the simple linear regression model:
yi = β0 + β1xi + ϵi
Part 1: β0 = 0
(a) (3pts) Assume β0 = 0. What is the interpretation of this assumption? What is the implication on the regression line? What does the regression line plot look like?
When β0 = 0, we can interpret the model as having no intercept. The errors are unobservable random variables with mean of 0 and variance of σ2. The mean of yi would be β1xi, the variance of yi would be σ2, and the covariance would be 0. Therefore, the plot of the model would be a slope starting at the origin. The new model would be yi = β1xi + ϵi regression line.
(b) (4pts) Derive the LS estimate of β1 when β0 = 0.
arg minβ0 SSR = Pni=1(yi − β0 − β1xi)2 n
∂ X − β0 − β1xi)2
(yi
∂β1 i=1
= Pni=1 xi(yi − β0 − β1xi) Since β0 = 0:
= Pni=1 xi(yi − β1xi)
= Pni=1 xiyi
= Pni=1 xiyi −Pni=1 β1x2i
= Pni=1 xiyi − β1 Pni=1 x2i
=> Pn xiyi − β1 Pni=1 x2i = 0
i=1
=> Pni=1 xiyi = β1 Pni=1 x2i
Pn xiyi
=> β1 = Pi=1ni=1x2i
(c) (3pts) How can we introduce this assumption within the lm function?
We can introduce the assumption within the lm function by adding 0 into the formula to indicate that a constant does not exist in the model.
Part 2: β1 = 0
(d) (3pts) For the same model, assume β1 = 0. What is the interpretation of this assumption? What is the implication on the regression line? What does the regression line plot look like?
When β1 = 1, we can interpret the model as not having a slope. The errors are still unobservable random variables with mean of 0 and variance of σ2. The mean of yi would be β0, the variance of yi would still be σ2, and the covariance would still be 0. Therefore, the plot of the model would always be a constant horizontal line. The new model would be yi = β0 + ϵi regression line.
(e) (4pts)Derive the LS estimate of β0 when β1 = 0. arg minβ0 SSR = Pni=1(yi − β0 − β1xi)2 n
X − β0 − β1xi)2
(yi i=1
= Pni=1(yi − β0 − β1xi)
= ny – nβ0 – nβ1x = 0 Since β1 = 0:

(f) (3pts)How can we introduce this assumption within the lm function?
We can introduce the assumption within the lm function by creating a formula relating yi to only β0.
3. Consider the simple linear regression model:
yi = β0 + β1xi + ϵi
(a) (10pts) Use the LS estimation general result βˆ = (XTX)−1XTy to find the explicit estimates for β0 and β1.
βˆ = (XTX)−1XTy
P(Xi−X)(Yi−Y ) SSxy We are trying to solve for β0 and β1, so for the purpose of this problem, let β0 = P(Xi−X)2 = SSx .
TX)−1 = n1 Pxni PPxx2ii −1
(X
P
= nSS1 x −Pxxi2i −Pxni
1 … 1  y1 
(XTy) = x1 … xn  …  yn
Pni=1 yi
= Pni=1 xiyi
(XTX)−1XTy = nSS1 x −PPxxi2i −Pxni PPni=1ni=1xiyyii
P
= nSS1 x −xP2i PxiyPi −yiP+xni PPxxiiyyii
= SS1x x2i −xiyxiP−xnixyyi
= SS1x x2i − ynx2 + xnxy − xPSSxixyyi
1xyx
= SSx SSxy
” # β0
= SSxy => β1
SSx
(b) (5pts) Show that the LS estimates βˆ0 and βˆ1 are unbiased estimates for β0 and β1 respectively.
Bias[βˆ1] = E[βˆ1] − β1
P(Xi−X)(Yi−Y )
E[βˆ1] = E[ P(Xi−X)2 ]
E[P(Xi−X)Yi−Y P(Xi−X)]
= P(Xi−X)2
P(Xi−X)E[Yi]
= P(Xi−X)2
P(Xi−X)(β0−β1Xi)
= P(Xi−X)2
β0P(Xi−X)+β1P(Xi−X)Xi
= P(Xi−X)2
P(Xi−X)(Xi−X+X)
= β1 P(Xi−X)2
P(Xi−X)2
= β1 P(Xi−X)2
= β1
Bias[βˆ1] = β1 − β1 = 0
Bias[βˆ0] = E[βˆ0] − β0 E[βˆ0] = E[Y − βˆ1X]
= E[n1 PYi − βˆ1X]
= n1 PE[Yi] − E[βˆ1]X
= n1 P(β0 + β1Xi) − β1X
= β0 + β1X − β1X
= β0
Bias[βˆ0] = β0 − β0 = 0
Appendix
library(knitr) library(MASS)
# set global chunk options: images will be 7×5 inches knitr::opts_chunk$set(fig.width=7, fig.height=5) options(digits = 4)
# the data set “trees” is contained in the R package “datasets”
require(datasets) head(trees)
# The dimensions of the tree data set
(dim(trees))
# Scatter plot matrix of the three variables pairs(trees)
# Correlation matrix of the three variables cor(trees)
# Number of NA values
sum(is.na(trees))
# Creating variables representing y, x1, and x2 y <- trees$Volume x1 <- trees$Girth x2 <- trees$Height
# Fitting a linear model to the tree data using the given formula lm_tree <- lm(log(y) ~ log(x1) + log(x2)) summary(lm_tree)
model_matrix <- model.matrix(lm_tree) model_matrix
beta_hat <- ginv(t(model_matrix) %*% model_matrix) %*% t(model_matrix) %*% log(y) beta_hat
beta0_hat <- -6.631617 beta1_hat <- 1.982650 beta2_hat <- 1.117123
y_hat <- beta0_hat + beta1_hat * log(x1) + beta2_hat * log(x2) residual <- log(y) – y_hat
estimate_of_error <- sum((residual)ˆ2)/28

Reviews

There are no reviews yet.

Be the first to review “PSTAT126 – Homework 1 (Solution)”

PSTAT126 – Homework 1 (Solution)

Description

Reviews

Related products

STAT – HOMEWORK, WEEK 9 Solved

STAT – HOMEWORK, WEEK 2 Solved

STAT – HOMEWORK, WEEK 10 Solved

STATISTICAL RETHINKING 2024 Solved

PSTAT126 – Homework 2 (Solution)