CSCI4390-6390 Assign1 (Solution)

$ 20.99
Category:

Description

Assign1: Covariance and Eigenvectors
This assignment consists of two parts. Part I must be done by students in both sections, namely CSCI4390 and CSCI6390. For the second p II-4390 must be done by those in CSCI4390, and Part II-CSCI6390 must be done by students registered for CSCI6390.
Part I (both CSCI-4390 and CSCI-6390): 50 Points a. Mean vector and total variance
Compute the mean vector µ for the data matrix, and then compute the total variance var(D); see Eq. (1.8) for the latter. b. Covariance matrix (inner and outer product form)
Compute the sample covariance matrix Σ as inner products between the attributes of the centered data matrix (see Eq. (2.38) in chapter 2 compute the sample covariance matrix as sum of the outer products between the centered points (see Eq. (2.39)). c. Correlation matrix as pair-wise cosines
Compute the correlation matrix for this dataset using the formula for the cosine between centered attribute vectors (see Eq. (2.30)).
Output which attribute pairs are i) the most correlated, ii) the most anti-correlated, and iii) the least correlated?
Create the scatter plots for the threee interesting pairs using matplotlib and visually confirm the trends, i.e., describe how each of the three results in a particular type of plot.
Part II: Eigenvectors (50 Points)
CSCI-4390 Only: Dominant Eigenvector
Compute the dominant eigenvalue and eigenvector of the covariance matrix Σ via the power-iteration method. One can compute the domi vector/-value of the covariance matrix iteratively as follows. Let
⎛⎜⎜⎜⎝1⎞⎟⎟⎟⎠
1 x0 =
⋮ 1
be the starting vector in Rd, where d is the number of dimensions.
In each iteration i, we compute the new vector:
xi = Σxi−1
We then find the element of xi that has the maximum absolute value, say at index m. For the next round, to avoid numerical issues with lar we re-scale xi by dividing all elements by xim, so that the largest value is always 1 before we begin the next iteration.
∥xi −xi−1∥2 < ϵ
For the final eigen-vector, make sure to normalize it, so that it has unit length.
Also, the ratio xxi−i1m,m gives you the largest eigenvalue. If you did the scaling as described above, then the denominator will be 1, but the nu will be the updated value of that element before scaling.
Once you have obtained the dominant eigenvector, u1, project each of the original data points xi onto this vector, and print the coordinate new points along this “direction”.
CSCI-6390 Only: First Two Eigenvectors and Eigenvalues
Compute the first two eigenvectors of the covariance matrix Σ using a generalization of the above iterative method.
Let X0 be a d×2 (random) matrix with two non-zero d-dimensional column vectors with unit length. We will iteratively multiply X0 w the left.
The first column will not be modified, but the second column will be orthogonalized with respect to the first one by subtracting its projectio the first column (see section 1.3.3 in chapter 1). That is, let a and b denote the first and second column of X1, where
X1 = ΣX0
Then we orthogonalize b as follows:
bTa
b = b−(aTa)a
After this b is guaranteed to be orthogonal to a. This will yield the matrix X1 with the two column vectors denoting the current estimates first and second eigenvectors.
Before the next iteration, normalize each column to be unit length, and repeat the whole process. That is, from X1 obtain X2 and so on, u convergence.
To test for convergence, you can look at the distance between Xi and Xi−1. If the difference is less than some threshold ϵ then we stop.
Once you have obtained the two eigenvectors: u1 and u2, project each of the original data points xi onto those two vectors, to obtain the projected points in 2D. Plot these projected points in the two new dimensions.
Submission
Submit your code via submitty. Name your python script: assign1.py.
Your script will be run as follows:
assign1.py FILENAME EPS
Here FILENAME is the name of the input CSV file, EPS the convergence threshold ϵ for the eigen-vector/-value computation.
Save all your output to a pdf file named assign1.pdf. The output should comprise the mean vector, total variance, covariance matrix via inn outer product formulas, correlation matrix, the observations, the dominant eigen-vectors and eigenvalues. The scatter plots should also be p output file as well, with any required comments. You will lose points if you do not include the output PDF file.
Note that since there are >19k points, you should not print out the full eigenvectors. Just print out the first 10 and last ten values per eigenv
Tutorial on Python and NumPy

Reviews

There are no reviews yet.

Be the first to review “CSCI4390-6390 Assign1 (Solution)”

Your email address will not be published. Required fields are marked *