Description
1. What is gradient descent? Why is it important?
2. What is Logistic Regression? What is “logistic”? What are we “regressing”?
3. Bob tries to gather information about this year’s apple harvest and ran a search in his favorite online news outlet. He retrieved a number of articles but found that a large portion of the retrieved articles are about the Apple laptops and computers — and hence irrelevant to his search. He wants to build a logistic regression classifier, which uses the counts of selected words in the news articles to predict the class of the news article (fruit vs. computer). He built the following data set of 5 training instances and 1 test instance. Develop a logistic regression classifier to predict label 𝑦! =1 (fruit) and 𝑦! =0 (computer).
ID apple ibm lemon sun CLASS
TRAINING INSTANCES
A 1 0 1 5 1 FRUIT
B 1 0 1 2 1 FRUIT
C 2 0 0 1 1 FRUIT
D 2 2 0 0 0 COMPUTER
E 1 2 1 7 0 COMPUTER
TEST INSTANCES
For the moment, we assume that we already have an estimate of the model parameters, i.e., the weights of the 4 features (and the bias 𝜃0) is 𝜃” = [𝜃0, 𝜃!, 𝜃2, 𝜃3, 𝜃4] = [0.2, 0.3, −2.2, 3.3, −0.2].
(i). Explain the intuition behind the model parameters, and their meaning in relation to the features
(ii). Predict the test label.
(iii). Recall the conditional likelihood objective
”
log ℒ(𝜃) = −+ y! log-𝜎(𝑥!; 𝜃)1 + (1 − 𝑦!) log-1 − 𝜎(𝑥!; 𝜃)1
!#$
We want to make sure that the Loss (the negative log likelihood) our model, is lower when its prediction the correct label for test instance T, than when it’s predicting a wrong label.
Compute the negative log-likelihood of the test instance (1) assuming that the true label y = 1 (fruit), i.e., our classifier made a mistake; and (2) assuming the true label as y = 0 (computer), i.e., our classifier predicted correctly.
1
4. For the model created in question 4, compute a single gradient descent update for parameter 𝜃! given the training instances given above. Recall that for each feature j, we compute its weight update as
𝜃” ← 𝜃” − 𝜂 0(𝜎(𝑥#; 𝜃) − 𝑦#)𝑥#”
#
Summing over all training instances i. We will compute the update for 𝜃% assuming the current parameters as specified above, and a learning rate 𝜂 = 0.1.
5. [OPTIONAL] What is the relation between “odds” and “probability”?
6. [OPTIONAL] (a) What is Regression? How is it similar to Classification, and how is it different?
(b) Come up with one typical classification task, and one typical regression task. Specify the range of valid values of y (results) and possible valid values for x (attributes).
2
Reviews
There are no reviews yet.