CS 446: Machine Learning Homework (Solution)

$ 20.99
Category:

Description

1. [10 points] SVM Basics
Consider the following dataset D in the two-dimensional space; x(i) ∈ R2 and y(i) ∈ {1,−1}
i x (i)
x2 y(i)
1 -1 3 1
2 -2.5 -3 -1
3 2 -3 -1
4 4.7 5 1
5 4 3 1
6 -4.3 -4 -1
Recall a hard SVM is as follows:
s.t. y(i)(w|x(i) + b ≥ 1) ,∀(x(i),y(i)) ∈ D (1)
(a) What is the optimal w and b? Show all your work and reasoning. (Hint: Draw it out.)

Your answer: Examples with indices 1, 2, 3 and 5 are support vectors.
(c) A standard quadratic program is as follows,
minimize 1 z|Pz + q|z z 2
subject to Gz ≤ h
Rewrite Equation (1) into the above form. (i.e. define z,P,q,G,h using w,b and values in D). Write the constraints in the same order as provided in D and typeset it using bmatrix.
Your answer:
Define variables as follows:
z has a dimension of (d+1,1), which d denotes the features’ dimension.
z
q has a dimension of (d+1,1), which d denotes the features’ dimension.
0
0
  q = 0
…

0
P has a dimension of (d+1, d+1), which d denotes the features’ dimension.
0 0 0 0 0 0… 0
0 1 0 0 0 0… 0
 
0 0 1 0 0 0… 0

 
0 0 0 1 0 0… 0 P = 0 0 0 0 1 0… 0
 
0 0 0 0 0 1… 0
… … … … … … …

0 0 0 0 0 0… 1
I use m to demote the number of observations (number of labels), h has dimension (m,
1).
−1
−1
  h = −1
 … 


−1
In matrix G, I use m to demote the number of observations (number of labels) and d to demote the number of features.
y1 x11y1 x12y1 … x1dy1 
y2 x21y2 x22y2 … x2dy2 
 
G = −y3 x31y3 x32y3 … x3dy3 
 … … … … 

 ym xm1ym xm2ym … xmdym
(d) Recall that for a soft-SVM we solve the following optimization problem.
s.t. y(i)(w|x(i) + b ≥ 1 − ξ(i)),ξ(i) ≥ 0 ,∀(x(i),y(i)) ∈ D (2)
Describe what happens to the margin when C = ∞ and C = 0.
Your answer:
I understand that C determines the influence of the misclassification on the objective function. The objective function is the sum of a regularization term and the misclassification rate.
If C = 0, then the weight of regularization term will be infinity. Then SVM gives a large margin, even though there are misclassifications there, because of the ’maximum of the margin’.
IF C = ∞, then the weight of regularization term will be zero. So SVM gives a much smaller margin will calssify all samples right.
2. [4 points] Kernels
(a) If K1(x,z) and K2(x,z) are both valid kernel functions, and α and β are positive, prove that
αK1(x,z) + βK2(x,z)
is also a valid kernel function.
Your answer:
K1(x,z) = Φ(1)(x)|Φ(1)(z)
K2(x,z) = Φ(2)(x)|Φ(2)(z)
Let us construct:
√ (1)p (2) Φ(x) = [ αΦ βΦ (x)]
Clearly then:
K(x,z) = Φ(x)|Φ(z)
= α[Φ(1)x, Φ(1)z] + β[Φ(2)x, Φ(2)z]
= αK1(x, z) + βK2(x,z)
So K(x,z) = αK1(x,z) + βK2(x,z) is also a valid kernel function.
(b) Show that K(x,z) = (x|z)2 is a valid kernel, for x,z ∈ R2.
(i.e. write out the Φ(·), such that K(x,z) = Φ(x)|Φ(z)

Reviews

There are no reviews yet.

Be the first to review “CS 446: Machine Learning Homework (Solution)”

Your email address will not be published. Required fields are marked *