CSCI4390-6390 Assign2 (Solution)

$ 29.99
Category:

Description

Assign2: High Dimensional Data and Dimensionality Reduction
Both Part I and II have to be done by all sections. Differences have been specified by CSCI4390 and CSCI6390 labels.
Part I: Principal Components Analysis (50 points)
Next, determine and print how many dimensions are required to capture α = 0.975 fraction of the total variance?
Also print the mean squared error in the approximation using the first three components.
Plot the PCs
CSCI4390 Only: Project the points along the first two PCs, and create a scatter plot of the projected points.
CSCI64390 Only: Project the points along the first three PCs, and create a 3D scatter plot of the projected points.
Part II: Diagonals in High Dimensions (50 points)
Your goal is the compute the probability mass function for the random variable X that represents the angle (in degrees) between any two d high dimensions.
Assume that there are d primary dimensions (the standard axes in cartesian coordinates), with each of them ranging from -1 to 1. There are additional half-diagonals in this space, one for each corner of the d-dimensional hypercube.
Randomly generate n = 100,000 pairs of half-diagonals in the d-dimensional hypercube (random d-dimensional vectors with elements and compute the angle between them (in degrees).
Plot the probability mass function (PMF) for three different values of d, as follows d = 10,100,1000. Recall that PMF is simply the plo angle versus the probability of observing that angle in the sample of n points for a given value of d. What is the min, max, value range, me variance of X for each value of d?
CSCI6390 Only: What would expect analytically? In other words, derive formulas for what should happen to angle between half-diagonal ∞. Does the PMF conform to this trend? Explain why? or why not?
What to submit
Write two python scripts named as Assign2-part1.py and Assign2-part2.py, one for each of the parts.
For part1, read the filename from the command line, assume it is in the local directory. So, part1 will be run as
Assign2-part1.py FILENAME ALPHA. FILENAME is the datafile name, and ALPHA is the approximation threshold α. In other your script must compute and return the correct number of components to capture α fraction of total variance.
For part2, the script will be run as Assign2-part2.py.
Submit a PDF file named Assign2.pdf that should include your solutions to each of the questions (just cut and paste the output from The figures should also be part of this file. Failure the submit the PDF will result in lost points.

Reviews

There are no reviews yet.

Be the first to review “CSCI4390-6390 Assign2 (Solution)”

Your email address will not be published. Required fields are marked *