Description
From Chapter 6 page 261 (Use R or Rstudio)
(a) (3 marks) Show that the ridge regression optimization problem in this setting (or the quantity in equation 6.5 in Chapter 6 in this setting) is 2(π¦β β (π½β + π½()π₯ββ)( + π(π½β( + π½(().
(b) (5 marks) Show that in the setting (a), the ridge coefficient estimates satisfy π½-β = π½-( .
(c) (3 marks) Show that the lasso regression optimization problem in this setting (or the quantity in equation 6.7 in Chapter 6 in this setting) is 2(π¦β β (π½β + π½()π₯ββ)( + π(|π½β| + |π½(|).
(d) (5 marks) Show that in the setting (c), the lasso coefficients π½-βand π½-βare not uniqueβin other words, there are many possible solutions to the optimization problem.
Q8. In this exercise, we will generate simulated data, and will then use this data to perform best model selection. Use the πππππ() function to generate a predictor π of length π = 100 , as well as a noise vector π of length n = 100 such that π = 0.1 * πππππ(π)
(a) (1 mark) Generate (use set.seed(19)) a response vector π of length π = 100 according to the model
π = π½. + π½βπ + π½(π( + π½;π; + π
where π½., π½β, π½(, and π½; are constants as π½. = 1.0, π½β = β0.1, π½( = 0.05, π½; = 0.75
(b) Use the regsubsets() function to perform best subset selection in order to choose the best model containing the predictors π, π(, π;, β¦ , πA using the measures πͺπ, π©π°πͺ, ππ
ππππππ
πΉπ
(i) (6 marks) Plot each measure against number of predictors on the same page using par(mfrow=c(2,2)).
(ii) (3 marks) Give the best model coefficients obtained from each
πΆP, π΅πΌπΆ, ππππ’π π‘ππ π
(.
Note:
1. You will need to use the data.frame() function to create a single data set containing both X and Y.
(c) Now fit a ridge regression model to the simulated data, again using π, π(, π;, β¦ , πA as predictors.
(i) (2 marks) Plot the extracted coefficients as a function of log(Ξ») with a legend containing each curve colour and its predictor name at the top-right corner.
(ii) (4 marks) Plot the cross-validation (set.seed(20)) error as a function of log(Ξ») to find the optimal Ξ» .
(iii) (1 mark) Give coefficient estimates for the optimal value of Ξ».
(d) Now fit a lasso model to the simulated data, again using π, π(, π;, β¦ , πA as predictors.
(i) (2 marks) Plot the extracted coefficients as a function of log(Ξ») with a legend containing each curve colour and its predictor name at the top-right corner.
(ii) (4 marks) Plot the cross-validation (set.seed(21)) error as a function of log(Ξ») to find the optimal Ξ» .
(iii) (1 mark) Give coefficient estimates for the optimal value of Ξ».
Note:
1. Use cv.glmnet() to do the cross-validation and use the default of 10-fold cross-validation.
Reviews
There are no reviews yet.