CS178 – Homework 3 (Solution)

Description

The submission for this homework should be a single PDF file containing all of the relevant code, figures, and any text explaining your results. Although you will be primarily filling in missing sections in a Python file, please include all relevant sections you have written as answers to the appropriate question.
logisticClassify2.py
HW3_Template.ipynb
Please be sure to download and use the latest version of the mltools package provided with this homework. The code provided inserves as a template for this assignment; you will update this code to make it functional. For your convenience, there is also a version of the same code in a more convenient form for Jupyter notebooks available as. If you prefer to run python as a script, you can edit the template file directly and then import it (as in the code snippets here), or copy it into your notebook and update it directly. Either way, be sure to include the listings of the updated functions in your writeup.
Problem 1: Logistic Regression (75 points)
logisticClassify2
In this problem, we’ll build a logistic regression classifier and train it on separable and non-separable data. Since it will be specialized to binary classification, we’ve named the class. We start by creating two binary classification datasets, one separable and the other not:
iris = np.genfromtxt(“data/iris.txt”,delimiter=None)
X, Y = iris[:,0:2], iris[:,-1] # get first two features & target
X,Y = ml.shuffleData(X,Y) # order randomly rather than by class label
X,_ = rescale(X) # rescale to improve numerical stability, speed convergence
XA, YA = X[Y<2,:], Y[Y<2] # Dataset A: class 0 vs class 1
XB, YB = X[Y>0,:], Y[Y>0] # Dataset B: class 1 vs class 2
1
2
3
4
5
6
7
For this problem, we focus on the properties of the logistic regression learning algorithm, rather than classification performance. Thus we will not create a separate validation dataset, and simply use all available data for training.
1. For each of the two datasets, create a separate scatter plot in which the training data from the two classes is plotted in different colors. Which of the two datasets is linearly separable? (5 points)
logisticClassify2.py
plotBoundary
2. Write (fill in) the functioninto compute the points on the decision boundary. In particular, you only need to make sure x2b is set correctly using self.theta . This will plot the data & boundary quickly, which is useful for visualizing the model during training. To demonstrate your function, plot the decision boundary corresponding to the classifier
sign( 2 + 6×1 − 1×2 )
along with dataset A, and again with dataset B. These fixed parameters should lead to an OK classifier on one data set, but a poor classifier on the other. You can create a “blank” learner and set the weights as follows:
import mltools as ml from logisticClassify2 import *
learner = logisticClassify2(); # create “blank” learner learner.classes = np.unique(YA) # define class labels using YA or YB wts = np.array([theta0,theta1,theta2]); # TODO: fill in values
learner.theta = wts; # set the learner’s parameters
1
2
3
4
5
6
7
plotBoundary
Include the lines of code you added to thefunction, and the two generated plots. (10 points)
logisticClassify2.predict
3. Complete thefunction to make predictions for your classifier. Verify that your function works by computing & reporting the error rate for the classifier defined in the previous part on both datasets A and B. (Remember that we are using a fixed, hand-selected value of theta; this is chosen to be reasonable for one data set, but not the other. So, the error rate should be about 0.06 for one dataset, and higher for the other.) Note that in the code, the two classes are stored in the variable self.classes ,
predict
where the first entry is the “negative” class (class 0) and the second entry is the “positive” class (class 1). You should create different learner objects for each dataset, and use the learner.err function. Your solution pdf should include thefunction implementation and the computed error rates. (10 points)
predict and plotBoundary implementations are consistent by using plotClassify2D
predict
4. Verify that your with your manually constructed learner on each dataset. This will callon a dense grid of points, and you should find that the resulting decision boundary matches the one you plotted previously. (5 points)
5. In the provided training code, we first transform the classes in the data Y into YY , with canonical labels for the two classes: “class 0” (negative) and “class 1” (positive). Let r(j) = x(j) ·θ =Pi xi(j)θi denote the linear response of the classifier, and σ(r) equal the standard logistic function:
.
The logistic negative log-likelihood loss for a single data point j is then
Jj(θ)= −y(j) log σ(x(j) ·θ) − (1 − y(j)) log(1 −σ(x(j) ·θ)),
where y(j) is 0 or 1.
Show that the gradient of the negative log likelihood Jj(θ) for logistic regression can be expressed as,
∇Jj =σ(x(j) ·θ)− y(j) x(j)
You will use this expression to implement stochastic gradient descent in the next part.
Hint: Remember that the logistic function has a simple derivative, σ0(r)=σ(r)(1 −σ(r)). (15 points)
6. Complete the train function to perform stochastic gradient descent on the logistic regression loss function. This will require that you fill in:
(a) computing the linear response r(j), logistic response s(j) =σ(r(j)), and gradient ∇Jj(θ) associated with each data point x(j), y(j);
stopEpochs
stopTol
(b) computing the overall loss function, J = m1 Pj Jj, after each pass through the full dataset (or epoch); (c) a stopping criterion that checks two conditions: stop when either you have reachedepochs, or J has changed by less thansince the last epoch.
Include the complete implementation of train in your solutions. (20 points)
7. Run the logistic regression train algorithm on both datasets. Describe the parameter choices (step sizes and stopping criteria) you use for each dataset. Include plots showing the convergence of the surrogate loss and error rate as a function of the number of training epochs, and the classification boundary after the final training iteration. (The included train function creates plots automatically.) (10 points)
8. Extra Credit (10 points): Add an L2 regularization term (+αPi θi2) to your surrogate loss function, and update the gradient and your code to reflect this addition. Try re-running your learner with some regularization (e.g., α= 2) and see how different the resulting parameters are. Find a value of α that gives noticeably different results than your un-regularized learner & explain the resulting differences.
pyplot.cla()
IPython.display.clear_output()
Plotting hints: The code generates plots as the algorithm runs, so you can see its behavior over time; this is done by repeatedly clearing the plot axes via. In Jupyter, you also need to clear the Jupyter display using.
input()

(a) (b) (c) (d)
Figure 1: Four datasets to test whether they can be shattered by a given classifier, i.e. can the classifier exactly separate their all possible binary colorings. No three data points are on a line.
Problem 2: Shattering and VC Dimension (15+5 points)
Consider the data points in Figure 1 which have two real-valued features x1, x2. We are also giving a few learners below. For the learners below, T[z] is the sign threshold function, T[z]=+1 for z ≥ 0 and T[z]= −1 for z < 0.
The learner parameters a, b, c, . . . are real-valued scalars, and each data point has two real-valued features x1, x2.
Which of the four datasets can be shattered by each learner? Give a brief explanation/justification and use your results to guess the VC dimension of the classifier (you do not have to give a formal proof, just your reasoning).
1. T(a + bx1 ) (5 points)
2. T((a ∗ b)x1 +(c/a)x2 ) (5 points)
3. T((x1 − a)2 +(x2 − b)2 + c ) (5 points)
4. Extra Credit: T(a + bx1 + cx2 )× T(d + bx1 + cx2 ) (5 points) Hint: The two equations are two parallel lines.
Problem 3: Statement of Collaboration (5 points)
It is mandatory to include a Statement of Collaboration in each submission, that follows the guidelines below. Include the names of everyone involved in the discussions (especially in-person ones), and what was discussed.
written notes, referring to Piazza, etc.). Especially after you have started working on the assignment, try to restrict
the discussion to Piazza as much as possible, so that there is no doubt as to the extent of your collaboration.
Problem 4: Halloween (5 points)
What did you do for Halloween?

Reviews

There are no reviews yet.

Be the first to review “CS178 – Homework 3 (Solution)”

CS178 – Homework 3 (Solution)

Description

Reviews

Related products

CS178 – Homework 4 (Solution)

CS178 – Homework 2 (Solution)