Assignment 4: CS 763, Computer Vision (Solution)

$ 29.99
Category:

Description

1. In this exercise, you will implement the Adaboost method for creating a strong binary classifier from a series of weaker classifiers. You will work with some synthetic datasets and also with the MNIST dataset containing images of digits.
Consider a training set consisting of N input vectors in a d-dimensional space, and their respective labels where ∀j,yj ∈ {−1,+1}. You will assign a scalar weight to each input vector. Before the first iteration of Adaboost, these weights will be set to be equal in value. In each round t, you will pick the best classifier from the following family of weak classifiers: ht(x;i,p,θ) = sign(p(xi − θ)) where xi is the ith element of d-dimensional input vector x, the parameter p ∈ {−1,+1} and θ is a real-valued threshold parameter. Basically, this classifier assigns input vector x the label ‘+1’ if either (1) xi > θ and p = +1, or (2) xi ≤ θ and p = −1. Otherwise it assigns x the label ‘-1’. The best classifier refers to the classifier producing the least weighted error on the training set, i.e. least value of ) where
I(.) is an indicator function that returns 1 if the predicate passed as a parameter is true, and returns 0 otherwise. Note that the search for the best classifiers involves picking the tuple (i,p,θ). After picking the best classifier, you will update the weights following the method in the standard Adaboost algorithm. This entire process is repeated for T rounds. You need to specify the value of T but T = 30 to 40 is sufficient. The final classifier after T rounds will have the form You will work with the following datasets.
(a) A dataset containing 2000 points in 2D drawn from a [0,1] bounded uniform random distribution. Labelall the points lying on or inside a rectangle bounded by the lines x = 0.3, x = 0.7, y = 0.3, y = 0.7 as +1 and the rest as −1. Randomly divide this dataset into disjoint sets of 1000 training points and 1000 test points (called dataset1).
(b) A dataset containing 2000 points in 2D drawn from a [0,1] bounded uniform random distribution. Labelall the points satisfying any of the following conditions as ‘+1’ and the rest as ‘-1’: (1) lying on or inside a rectangle bounded by the lines x = 0.3, x = 0.7, y = 0.3, y = 0.7 as +1 , (2) with x-coordinate between 0.15 and 0.25 or between 0.75 and 0.85, (3) with y-coordinate between 0.15 and 0.25 or between 0.75 and 0.85. Randomly divide this dataset into disjoint sets of 1000 training points and 1000 test points (called dataset2).
(c) A dataset containing 2000 points in 2D drawn from a zero-mean Gaussian distribution of standarddeviation 2. Label all the points whose distance from the origin is less than 2 as +1 and the rest as
-1. Randomly divide this dataset into disjoint sets of 1000 training points and 1000 test points (called dataset3).
(d) A dataset containing 2000 points in 2D drawn from a zero-mean Gaussian distribution of standarddeviation 2. Label all the points whose distance from the origin is either less than 2, or between 2.5 and 3, as +1 and the rest as -1. Randomly divide this dataset into disjoint sets of 1000 training points and 1000 test points (called dataset4).
(e) The MNIST database is a popular dataset containing images of handwritten digits from 0 to 9. It canbe downloaded from http://yann.lecun.com/exdb/mnist/. The dataset contains four files each in ‘idx’ format, namely train-images-idx3-ubyte.gz which contains the training data consisting of 60000 images of size 28 by 28 each, train-labels-idx1-ubyte.gz which contains the respective labels of the training images, t10k-images-idx3-ubyte.gz which contains 10000 images of size 28 by 28 each for testing, with their respective labels (for ground truth) marked out in t10k-labels-idx1-ubyte.gz. A MATLAB script for reading these files into memory is uploaded here: https://www.cse.iitb.ac.in/~ajitvr/ CS763_Spring2016/HW4/readMNIST.m. You will need to gunzip each of the four files before calling this (parameter-free) function readMNIST. You should perform training on the first 5000 images from the training set. Label the images belonging to the digit ‘2’ as ‘+1’ and all the others as ‘-1’. Thus for this database, your job is to determine whether a given image contains the selected digit ‘2’ or not.
For each of the five datasets, do the following after each round of Adaboost: (1) estimate and print the training error of the strong classifier created thus far, (2) estimate and print the error of the strong classifier created thus far on the test set. For the first four datasets, also plot the test points with their associated labels using the MATLAB function called ‘scatter’ (in each round of Adaboost). Sample scatter plots for the first four datasets can be found at http://www.cse.iitb.ac.in/~ajitvr/CS763_Spring2015/HW4/ (named as dataset1.jpg and so on). You do not need to include the scatter plot images from each iteration in your report. Finally, for all five datasets, plot a graph of the test set error versus the number of rounds of Adaboost and include it in your report. Do you notice something peculiar with the fourth dataset? Explain what you would you do to remedy that situation (there is no need to implement). [4+4+4+4+10+4 = 30 points]
(a) Write the mathematical relation between n(x,y) and h(x,y).
(b) Let rx and ry be the x and y components of the vector r. Prove that (rx,ry) is parallel to (
(c) Express the magnitude of vector QQ0 in terms of α, β and h0. What is the relation between QQ0 and r?
(d) Now, let us assume that the water surface fluctuations are very small. Hence α, β and change in height are very small. Use this to prove that (r
Useful formulae: sin(A+B) = sinAcosB +cosAsinB;cos(A+B) = cosAcosB −sinAsinB;tan(A+B) = . For small A, sinA ≈ tanA.

Reviews

There are no reviews yet.

Be the first to review “Assignment 4: CS 763, Computer Vision (Solution)”

Your email address will not be published. Required fields are marked *