Description
Instructions
• This assignment should be attempted individually. All questions are compulsory.
• Theory.pdf: For conceptual questions, either a typed or hand-written .pdf file of solutions is acceptable.
• Code Files: For programming questions, the use of any one programming language throughout this assignment is acceptable. For python, either .ipynb or .py file is acceptable. For other programming languages, submit the files accordingly. Make sure the submission is self-complete & replicable i.e., you are able to reproduce your results with the submitted files only.
• Regarding Coding Exercises: You can use modules from sklearn or statsmodels or any similar library for writing the code. Use random seed wherever applicable to retain reproducability.
• File Submission: Submit a .zip named A RollNo.zip (e.g., A1 PhD22100.zip) file containing Theory.pdf, Report.pdf, and Code files.
• Resource Constraints: In any question, if their is a resource constraint in terms of computational capabilities at your end, you are allowed to sub-sample the data (must be stratified). Make sure to exclusively mention the same in the report with proper details about the platform that didn’t work for you.
• Compliance: The questions in this assignment are structured to meet the Course Outcomes CO2, CO3, and CO4, as described in the course directory.
• There could be multiple ways to approach a question. Please explain your approach briefly in the report.
iii. Write your own function to calculate class-wise F1 score. (5 points) iv. Check the F1 scores using sklearn inbuilt function and compare the value with the F1 scores returned by your function written from scratch. Also, report the accuracy. (5 points)
(b) Non linear SVM:
Build non-linear models with the RBF kernel as well as polynomial kernel. Report the accuracy. (5 points)
(c) Perform (grid search) cross-validation to find the optimal values of cost C and gamma for SVM classifier using RBF kernel. (6 points)
(d) Choose the best combination of C and gamma and build the final model with chosen hyperparameters.
Display the confusion matrix and report the accuracy of the model. (5 points)
(e) i. Develop a new training set by extracting the support vectors from the SVM fitted above for the chosen hyperparameters. (5 points) ii. Now fit another SVM with the new training set and report the accuracies(train, test). (5 points).
iii. Compare the accuracies with the previous models. State your observations. (5 points)
2. Support Vector Regressor (20 points)
Refer to the file ′SV R.ipynb′ and add code to this starter code file.
Link to ′SV R.ipynb′:
https://drive.google.com/file/d/1obJlV7QWFpBM2Jic_fuPSZGsl57ZseP-/view?usp=sharing
3. Theory Question (10 points)
(a) Maximize f(x,y) = xy subject to x + y2 ≤ 2 and x,y > 0 using KKT conditions (7 points)
(b) True or False (with justification) Given a linearly separable data, the margin of the decision boundary produced by SVM will always be greater than or equal to the margin of the decision boundary produced by any other hyperplane that perfectly classifies that data (hyperplane) for the given training dataset. (3 points)
4. Theory Question (16 points)
′
A key benefit of SVM training is the ability to use kernel functions K(x,x ) as opposed to explicit basis functions ϕ(x). Kernels make it possible to implicitly express large or even infinite dimensional basis features. We do this by computing ) directly, without ever computing ϕ(x).
When training SVMs, we begin by computing the kernel matrix K, over our training data {x1,…,xn}. The kernel matrix, defined as Ki,i′ = K(xi,xi′), expresses the kernel function applied between all pairs of training points.
The Mercer’s theorem tells us that any function K that yields a positive semi-definite kernel matrix forms a valid kernel, i.e. corresponds to a matrix of dot-products under some basis ϕ. Therefore, instead of using an explicit basis, we can build kernel functions directly that fulfill this property. A particularly nice benefit of this theorem is that it allows us to build more expressive kernels by composition.
In this problem, you are tasked with using Mercer’s theorem and the definition of a kernel matrix to prove that the following compositions are valid kernels, assuming K(1) and K(2) are valid kernels. Recall that a positive semi-definite matrix K requires zTKz ≥ 0,∀z ∈ Rn
(a) K(x,x′) = cK(1)(x,x′);c > 0 (4 points)
(b) K(x,x′) = K(1)(x,x′) + K(2)(x,x′) (4 points)
(c) K(x,x′) = f(x)K(1)(x,x′)f(x′) ; Where f is any function from Rm to R (4 points)
(d) K(x,x′) = K(1)(x,x′)K(2)(x,x′) (4 points)
Hint: Use the property that for any ϕ(x),K(x,x′) = ϕ(x)Tϕ(x′) forms a positive semi-definite kernel matrix.
5. Theory Question (19 points)
Consider the following dataset that has 3 points in 1 D.
(a) Are the classes {+,−} linearly separable? (1 point) √
(b) Map each point to 3-D using new feature vectors ϕ(x) = [1, 2x,x2]T. Are the classes now linearly separable? If yes, find a separating hyperplane. (3 points)
(c) Define a class variable yi ∈ {1,+1} which denotes the class of xi and let w = (w1,w2,w3)T.The maxmargin SVM classifier solves the following problem.
yi(wTϕ(xi) + b) ≥ 1, i = 1,2,3
Using the method of Lagrange multipliers show that the solution is ˆw = (0,0,−2)T, b = 1 and the margin is . (8 points)
(d) Show that the solution remains the same if the constraints are changed to
yi(wTϕ(xi) + b) ≥ ρ, i = 1,2,3
for any ρ ≥ 1. (3 points)
(e) Is your answer to (d) also true for any dataset and ρ ≥ 1? Provide a counter-example or give a short proof. (4 points)
Reviews
There are no reviews yet.