CSCI_5521 – HW2 (Solution)

$ 20.99
Category:

Description

Arnab Dey
Student ID: 5563169
Email: dey00011@umn.edu
Solution 1.a
In multi-variate case when x is d-dimensional and normal distributed, we have

where Ni is the total number of samples in class Ci, Σi is the covariance matrix for the variables belonging to each sample of class Ci, µi is the mean vector for samples in class Ci. The log-likelihood function to estimate µi and Σi is given as follows:
(1)
From Eq.1, to nd the estimate of µi, what we denote as mi, we set the derivative of log-likehood function w.r.t µi to 0.

(2) [Pre − multiplying by Σi]
Similarly, we can nd the estimate of Σi. Before, doing that let us write the terms of log-likelihood function which depends on Σi as the other terms will eventually become 0 when we will take the derivative. We will also use the fact that xTAx = trace[xTAx] = trace[xxTA]
The log-likelihhod function involving the terms that depend on Σi can be written as:
(3)
From Eq.1, to nd the estimate of Σi, what we denote as Si, we can equivalently set the derivative of L0 w.r.t Σ−i 1 to 0, i.e.

(4)
Using the estimate of µi, mi, we can write
Ni
S (5)
t=1
For model 1, where S1 and S2 are independent, we have to use the equations as shown in Eq.5 and Eq.2. For model 2, we assume that S is shared between two classes, therefore, we need to take the expectation of what is given in Eq.5. Hence,
S1 = S2 = P(C1)S1 + P(C2)S2
where Si is given in Eq.5.
For model 3, we assume that variables in the samples of each class are independent. Therefore, in this case we have to take only the diagonal terms of corresponding Si from Eq.5 setting all the o -diagonal terms to
0.
Solution 1.c
Table 1 shows the error rates for di erent models on di erent test sets.
Model 1 2 3
test set 1 30.0% 24.5% 25.0%
test set 2 4.5% 21.0% 14.5%
test set 3 23.5% 25.5% 21.5%
Table 1: Q1.c: Error-rates for di erent models and di erent test sets
From the table if we match the data pair to the model which gives lowest error rates on the test data then we can conclude the following:
data pair Chosen model
data pair 1 2
data pair 2 1
data pair 3 3
Table 2: Q1.c: Chosen model for each data pair based on lowest error rate on test data set
Explanantion of di erent error rates with di erent models
When we choose independent S1 and S2, the discriminant is non-linear. Moreover, when model 2 is chosen, the discriminant becomes linear and nally if model 3 is chosen, we assume that the variables are independent. Therefore, as data pair 2 gives lowest error rate with model 1, we can say that the data in the data pair 2 is not linearly separable and does not have independent variables. Data pair 1 can be linearly separable but variables are not independent. For data pair 3, the data are linearly separable and variables are independent also.
Solution 2.a
Error rates for di erent k in k-nearest neighbor algorithm on Optdigit dataset have been tabulated in the table below:
k 1 3 5 7
Error rate(%) 5.387 4.040 4.377 5.387
Table 3: Q2.a: Error-rate Vs. k table
Solution 2.b
We performed PCA on Optdigits training data and found the following proportion of variance plot:

Figure 1: Proportion of variance plot for Optdigits training data
We can see from Fig.1 that the minimum number of eigenvectors that explain at least 90% of the variance is 20.
Therefore, we used 20 principal components for PCA and reduced the dimension of the original Optdigits data to 20. Then, we used KNN on this reduced dimension Optdigits test data. The following table shows the error rates for di erent k in k-nearest neighbor algorithm on reduced Optdigits test data.
k 1 3 5 7
Error rate(%) 4.040 4.040 4.040 4.377
Table 4: Q2.b: Error-rate Vs. k table
Solution 2.c
Fig.2 shows the both Optdigits training and test data after PCA with 2 components.

Figure 2: Optdigits data after PCA
Solution 2.d
Table 5 shows error rates for di erent L dimensions and di erent k neighbors for KNN algorithm on Optdigits test data.
L 2 4 9
k=1 44.781% 19.191% 9.764%
k=3 41.414% 18.518% 9.427%
k=5 40.740% 15.824% 9.427%
Table 5: Q2.d: Error-rates for di erent L dimensions and k neighbors
Solution 2.e
Fig.3 shows the both Optdigits training and test data after LDA with 2 components.

Figure 3: Optdigits data after LDA
Solution 3.a
The mean face is shown in Fig.4 The rst 5 eigen-faces are shown in Fig.5

Figure 4: Mean face

Figure 5: First 5 eigen-faces
Solution 3.b
We performed PCA on face training data and found the following proportion of variance plot:

Figure 6: Proportion of variance plot for face training data
We can see from Fig.6 that the minimum number of eigenvectors that explain at least 90% of the variance is 40.
Therefore, we used 40 principal components for PCA and reduced the dimension of the original face data to 40. Then, we used KNN on this reduced dimension face test data. The following table shows the error rates for di erent k in k-nearest neighbor algorithm on reduced face test data.
k 1 3 5 7
Error rate(%) 10.483 24.193 39.516 39.516
Table 6: Q3.b: Error-rate Vs. k table
Solution 3.c

Figure 7: Original rst 5 faces from training data

Figure 8: First 5 reconstructed faces using 10 principal components

Figure 9: First 5 reconstructed faces using 50 principal components

Figure 10: First 5 reconstructed faces using 100 principal components
Thus we can see that as we increase the number of principal components, reconstructed images become closer to the original images but at a cost of increased complexity and processing time.

Reviews

There are no reviews yet.

Be the first to review “CSCI_5521 – HW2 (Solution)”

Your email address will not be published. Required fields are marked *