Description
Discussion is encouraged on Piazza as part of the Q/A. However, all assignments should be done individually.
Structure
Homework 1 will have two components to it: the theory questions in this file along with a programming portion in a Jupyter notebook. The homework is worth a total of 110 points, where 10 of these are bonus points. The grading breakdown is as follows:
1. Theory (85+10 bonus): problems 1-4 are worth 85 points and problem 5 is worth 10 bonus points.
2. Programming (15): there are four subproblems in this part of the assignment (in the .ipynb file)
Instructions
We will be using Gradescope this semester for submission and grading of assignments.
Make your submission as follows:
– Questions 1-4: Submit it in A1 Written assignment of gradescope in .pdf format
– Question 5: Submit it in A1 Written: Bonus Questions assignment of gradescope in .pdf format
– Submit 2 questions from Q 1-4 in A1 Written: Early Bird assignment of gradescope for early bird bonus
– Submit the programming solutions to A1 Programming assignment of gradescope as per the instructions in .ipynb file
1 Linear Algebra [20 points]
1.1 Determinant and Inverse of Matrix [10pts] Given a matrix M:
(a) Calculate the determinant of M for x = -2. [2pts] (Calculation process required.)
4 1 x (1)
) + 2 (2)
(3)
(b) Calculate M−1 for x = -2.. [4pts] (Calculation process required)
(Hint: Please double check your answer and make sure MM−1 = I)
2 −4 10 0 2 −4 10 0
4 1 −21 0 ∼ 0 9 −41 0
2 1 10 1 2 1 10 1
2 −41 0 0 2 −41 0 0
∼ 0 9−2 1 0 ∼ 0 5−1 0 1
0 5−1 0 1 0 9−2 1 0
2 −41 0 0 2 −41 0 0
∼ 0 1−15 0 15 ∼ 0 1−15 0 15
0 9 −4 −2 1 0 0 0−15 1 −95
2 −4 11 0 0 2 −41920 14 −209
∼ 0 1 0−15 0 15 ∼ 0 1−51 0 15
0 0 1201 −14 209 0 0201 −14 209
2 0 0203 14 −207 1 0403 81 −407
∼ 0 1 0−15 0 15 ∼ 0 151 0 15
0 0 1201 −14 209 0 0201 −14 209
403 18 −407
M−1 = −15 0 15 (4)
201 −14 209
(c) What is the relationship between the determinant of M and the determinant of M−1 from parts (a) and (b)? [2pts]
The determinants of M and M−1 are the same.
(d) What happens to the inverse of the matrix if x =2 ? Why? [2pts] M becomes a singular matrix, as the determinant becomes 0, which means an inverse does not exist.
1.2 Singular Value Decomposition [10pts] Given a matrix A:
Compute the Singular Value Decomposition (SVD) by following the steps below. Your full calculation process is required.
(a) Calculate all eigenvalues of AAT and ATA. [3pts]
(b) Calculate all eigenvectors of AAT normalized to unit length. [3pts]
(AAT − λI)v = 0
λ = 40
0
50
0
50
0 v2 =
1
(c) Calculate all eigenvectors of ATA normalized to unit length. [3pts] (ATA − λI)v = 0 λ = 0
85 −15 00
−15 45 00
0 0 00
0 v1 = 0
1
λ = 40
45 −15 00
−15 5 00
0 0 −400
1
v2 = 3λ = 90
0
−5 −15 00
−15 −45 00
0 0 −490
−3 v3 = 1
0
(d) Write out the SVD of matrix A in the following form [1pts]:
A = UΣV T
Hints:
The square roots of the positive eigenvalues make up the singular values, the diagonal entries in Σ. They will be arranged in descending order, all other values in Σ are 0 eigenvectors of ATA make up the rows of V T eigenvectors of AAT make up the rows of U
Reconstruct matrix A from the SVD to check your answer
2 Expectation, Co-variance and Independence [25pts]
Suppose X,Y and Z are distinct random variables. Let X follow a distribution with probability mass function
(a) What is the expectation and variance of X? [6pts]
(b) Show that Z also follows a normal distribution and compute its expected value and variance. [10pts]
Since X and Y are independent of each other, and , their conditional and unconditional distributions are the same, and Z must have a normal distribution.
(c) Compute Cov(Y,Z). [5pts]
(d) Are Y and Z independent? Explain. [4pts]
No, because Cov(Y,Z) 6= 0 and Z depends on Y in the equation
3 Maximum Likelihood [20 pts]
3.1 Discrete Example [10 pts]
Suppose we have a 4-sided die and let X denote the random face that comes up on a throw. Its pmf is given by Table 1, where θ,p ∈ [0,1]. Suppose we throw
x 1 2 3 4
pX(x) θp (1 − θ)p θ(1 − p) (1 − θ)(1 − p)
Table 1: Pmf of X
the die a certain number of times and observe xi i’s, for i = 1,…,4 (i.e., face i comes up xi times).
(a) What is the likelihood of this experiment given θ? (You should treat p as a constant) [4pts]
L(D,θ) = (θp)1((1 − θ)p)2(θ(1 − p))3((1 − θ)(1 − p))4
= p3(1 − p)7θ4(1 − θ)6
(b) What is the maximum likelihood estimate of θ? [6pts]
3.2 Normal Distribution [10 pts]
Suppose we sample n i.i.d. points from a Gaussian distribution with mean µ and variance σ2. Recall that the Gaussian pdf is given by
Compute the maximum likelihood estimate of parameters µ and σ2.
MLE of µˆ is x¯, MLE of
4 Information Theory [20 points]
4.1 Marginal Distribution [7pts]
Suppose the joint probability distribution of two binary random variables X and Y are given as follows.
HHHYH X 1 2
0 1
4 1
2
1 1
4 0
H
(a) Show the marginal distribution of X and Y , respectively. [4pts]
if x = 0 if x = 1
0 otherwise if y = 1,2
0 otherwise
(b) Find mutual information for the joint probability distribution in the previous question [3pts]
4.2 Mutual Information and Entropy [13pts]
Given a dataset as below.
Patient Temperature(x1) Cough(x2) Fatigue(x3) Nausea(x4) COV ID? (Y )
1 < 37 Y es Absent Absent Low
2 37 − 38 Y es Present Present High
3 < 37 No Absent Present Low
4 37 − 38 No Absent Present Low
5 < 37 Y es Present Absent High
6 > 38 No Absent Absent Low
7 37 − 38 No Absent Present Low
8 > 38 Y es Present Absent High
9 < 37 No Present Present High
10 37 − 38 Y es Present Absent High
11 37 − 38 No Absent Absent Low
12 < 37 Y es Present Present High
13 > 38 Y es Absent Absent High
14 37 − 38 Y es Present Absent High
You are analyzing the relationship between the signs and symptoms of COVID-19 for early detection and assessment to reduce the transmission rate of SARS-Cov-2. We want to determine what symptoms might affect the contraction of COVID-19. Each input has four features (x1, x2, x3, x4): Temperature (in degree Celsius), Cough, Fatigue, Nausea. The outcome is the probability to contract COVID (High vs Low), which is represented as Y .
(a) Find entropy H(Y ). [2pts]
(b) Find conditional entropy H(Y |x1), H(Y |x4), respectively. [5pts]
(c) Find mutual information I(x1,Y ) and I(x4,Y ) and determine which one
(x1 or x4) is more informative. [4pts]
I(x1,Y ) = H(Y ) − H(Y |x1)
= 0.9852 − 0.9721 = 0.0131
I(x4,Y ) = H(Y ) − H(Y |x4)
= 0.9852 − 0.9740 = 0.0112
Since I(x1,Y ) > I(x4,Y ), x1 is more informative.
(d) Find joint entropy H(Y,x3). [2pts]
5 Bonus for All [10 pts]
5.1 Mutual Information [3 pts]
Prove that the mutual information is symmetric, i.e., I(X,Y ) = I(Y,X) and xi ∈ X,yi ∈ Y
5.2 Probabilities [7 pts]
(a) What is the probability that the delivery will arrive on time if the distance is between 5 and 10 miles? [2 pts]
(b) What is the probability that the delivery will arrive on time if the distance is within 5 miles? [3 pts]
(c) What is the probability that the delivery will arrive late if the distance is within 5 miles? [2 pts]
Reviews
There are no reviews yet.