CS760 – HOMEWORK1 (Solution)

Description

Devansh Goenka
Student ID: 908 335 1354
Instructions: This is a background self-test on the type of math we will encounter in class. If you find many questions intimidating, we suggest you drop 760 and take it again in the future when you are more prepared. Use this latex file as a template to develop your homework. Submit your homework on time as a single pdf file to Canvas. There is no need to submit the latex source or any code. Please check Piazza for updates about the homework.
1 Vectors and Matrices [6 pts]
Consider the matrix X and the vectors y and z below:
y z
1. Compute yTXz
y y yTXZ = 0 [Answer]
2. Is X invertible? If so, give the inverse, and if no, explain why not.
The columns of X are linearly independent, hence we can say that X is invertible (or non-singular).
One more way of proving this is to check if detX 6= 0

Thus, the inverse of X exists.
X
2 Calculus [3 pts]
1. , what is the partial derivative of y with respect to x?

3 Probability and Statistics [10 pts]
Consider a sequence of data S = (1,1,1,0,1) created by flipping a coin x five times, where 0 denotes that the coin turned up heads and 1 denotes that it turned up tails.
1. (2.5 pts) What is the probability of observing this data, assuming it was generated by flipping a biased coin with p(x = 1) = 0.6?
With the biased coin, p(x = 1) = 0.6
The probability of observing this data becomes : 0.6 * 0.6 * 0.6 * 0.4 * 0.6 = 0.05184 [Answer]
2. (2.5 pts) Note that the probability of this data sample could be greater if the value of p(x = 1) was not 0.6, but instead some other value. What is the value that maximizes the probability of S? Please justify your answer.
Let us say that the value of p(x=1) which maximizes the above sequence is x. Clearly, 0 < x < 1.
The probability of the above sequence occurring then becomes:
x ∗ x ∗ x ∗ (1 − x) ∗ x = x4 − x5
As we know, to find the maxima of this polynomial function, we need to equate the first derivative to 0.

To verify if this is indeed the maxima, we need to check that the second order derivative should be negative.
f00(x) = 12×2 − 20×3
At x = 0.8, f00(x) = −2.56
Thus, the value for p(x = 1) which maximises the sequence is 0.8[Answer].
3. (5 pts) Consider the following joint probability table where both A and B are binary random variables:
A B P(A,B)
0 0 0.3
0 1 0.1
1 0 0.1
1 1 0.5
(a) What is P(A = 0|B = 1)?
From the chain rule of probability, we have :

Thus,
(b) What is P(A = 1 ∨ B = 1)?
We know:
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Thus,
P(A = 1 ∨ B = 1) = P(A = 1 ∪ B = 1) = P(A = 1) + P(B = 1) − P(A = 1 ∩ B = 1)
→ 0.6 + 0.6 − 0.5 = 0.7 [Answer]
4 Big-O Notation [6 pts]
For each pair (f,g) of functions below, list which of the following are true: f(n) = O(g(n)), g(n) = O(f(n)), both, or neither. Briefly justify your answers.
1. f(n) = ln(n), g(n) = log2(n). Both are true.
g(n) = ln(2) ∗ f(n)
Thus, we can re-write the above functions as f(n) = k ∗ g(n) and g(n) = k0 ∗ f(n) Therefore, f(n) = O(g(n) and g(n) = O(f(n)) [Answer]
2. f(n) = log2 log2(n), g(n) = log2(n).
Only f(n) = O(g(n)) is true.
For sufficiently large n, we can easily see that f(n) < g(n) Therefore, we can say that f(n) = O(g(n)) [Answer]
3. f(n) = n!, g(n) = 2n.
Only g(n) = O(f(n)) is true.
For sufficiently large n, g(n) < f(n)
Therefore, we can say that g(n) = O(f(n)) [Answer]
5 Probability and Random Variables
5.1 Probability [12.5 pts]
State true or false. Here Ω denotes the sample space and Ac denotes the complement of the event A.
1. For any A,B ⊆ Ω, P(A|B)P(A) = P(B|A)P(B).
False
2. For any A,B ⊆ Ω, P(A ∪ B) = P(A) + P(B) − P(B ∩ A).
True
3. For any A,B,C ⊆ Ω such that .
True
4. For any A,B ⊆ Ω such that P(B) > 0,P(Ac) > 0, P(B|AC) + P(B|A) = 1.
False
5. If A and B are independent events, then Ac and Bc are independent. True
5.2 Discrete and Continuous Distributions [12.5 pts]
Match the distribution name to its probability density / mass function. Below, |x| = k.

otherwise
(a) Gamma (j)
(b) Multinomial (i)and
(c) Laplace (h) otherwise
(d) Poisson (l) oth-
(e) Dirichlet (k) erwise
and ; 0 otherwise for all x ∈ Z+; 0 otherwise
5.3 Mean and Variance [10 pts]
1. Consider a random variable which follows a Binomial distribution: X ∼ Binomial(n,p).
(a) What is the mean of the random variable? np
(b) What is the variance of the random variable? np(1 − p)
2. Let X be a random variable and E[X] = 1,Var(X) = 1. Compute the following values:
(a) E[5X]
We know, E[αX] = αE[X]
Thus, E[5X] = 5E[X] = 5 [Answer]
(b) Var(5X)
We know, Var(αX) = α2Var(X)
Thus, Var(5X) = 25Var(X) = 25 [Answer]
(c) Var(X + 5)
The variance is not affected by addition of a constant as the distribution still remains the same. Hence,
Var(X + 5) = 1 [Answer]
5.4 Mutual and Conditional Independence [12 pts]
1. (3 pts) If X and Y are independent random variables, show that E[XY ] = E[X]E[Y ].
Let us consider that both random variables are continuous.
We know that:
E[g(x,y)] = RR g(x,y)fxy(x,y)dydx
Here, g(x,y) = XY
E(XY ) = RR xyfxy(x,y)dydx
Since X & Y are independent,the joint probability function can factor,
E(XY ) = RR xyfx(x)fy(y)dydx
E(XY ) = R xfx(x)R yfy(y)dydx
E(XY ) = E(Y ) · R xfx(x)dx
E(XY ) = E(X) · E(Y ) [Answer]
2. (3 pts) If X and Y are independent random variables, show that Var(X + Y ) = Var(X) + Var(Y ). Hint: Var(X + Y ) = Var(X) + 2Cov(X,Y ) + Var(Y )
We begin with the expansion:
Var(X + Y ) = Var(X) + 2Cov(X,Y ) + Var(Y )
We know that Cov(X,Y ) = E(XY ) − E(X)E(Y )
Since X & Y are independent, E(XY ) = E(X)E(Y )
→Cov(X, Y) = 0
→Var(X+Y) = Var(X) + 2*0 + Var(Y)
→Var(X+Y) = Var(X) + Var(Y) [Answer]
3. (6 pts) If we roll two dice that behave independently of each other, will the result of the first die tell us something about the result of the second die?
No, the rolls are independent of each other.
If, however, the first die’s result is a 1, and someone tells you about a third event — that the sum of the two results is even — then given this information is the result of the second die independent of the first die?
Now, the outcome of the second die becomes dependent on the first die.
5.5 Central Limit Theorem [3 pts]
Prove the following result.
1. Let , then the distribution of X¯ satisfies
√ ¯ n−→∞→ N(0,1)
nX
Here, X1,X2,…Xn are i.i.d
According to the Law of Large Numbers:
Whenever n → ∞, and E(X1) = E(X2) = E(Xn) = µ, and we have

Then X¯ converges to µ
We have :√√
E( nX¯) = nE(X¯)
Since√X¯ converges to µ, which is 0 here,
E( nX¯) = 0
Similarly,

Moreover, from the Central Limit Theorem, we know that whenever i.i.d variables are added, their normalized sum converges to a Normal Distribution.
Thus, we can establish that √ ¯ n−→∞→ N(0,1) [Answer] nX
6 Linear algebra
6.1 Norms [5 pts]
Draw the regions corresponding to vectors x ∈ R2 with the following norms: 1. ||x||1 ≤ 1 (Recall that ||x||1 = Pi |xi|)

The shaded region indicates all vectors having L-1 norm ≤ 1

2. ||x||2 ≤ 1 (Recall that ||x||2 = pPi x2i)

The shaded region indicates all vectors having L-2 norm ≤ 1
3. ||x||∞ ≤ 1 (Recall that ||x||∞ = maxi |xi|)

The shaded region indicates all vectors having L-∞ norm ≤ 1
For , Calculate the following norms.
4. ||M||2 (L2 norm)
The L2 norm or spectal norm is equal to the largest singular value in the SVD of M
The largest singular value after performing the SVD of M is 7 Therefore, ||M||2 = 7 [Answer]
5. ||M||F (Frobenius norm)
q
6.2 Geometry [10 pts]
Prove the following. Provide all steps.
Let us consider a point x on the hyperplane. To find the smallest distance to the hyperplane from the origin, we need to calculate the projection of the vector from x to the origin (o) on w.
Thus: distance = ||projw(x)||

Now, As w · x = −b
distance
2. The Euclidean distance between two parallel hyperplane wTx+b1 = 0 and wTx (Hint: you can use the result from the last question to help you prove this one).
Using the previous solution, we can say that: d1 = Distance from origin to hyperplane 1 d2 = Distance from origin to hyperplane 2 Now, the distance between these two hyperplanes is:

7 Programming Skills [10 pts]
Sampling from a distribution. For each question, submit a scatter plot (you will have 3 plots in total). Make sure the axes for all plots have the same ranges.
1. Make a scatter plot by drawing 100 items from a two dimensional Gaussian N((1,−1)T,2I), where I is an identity matrix in R2×2.

100 samples drawn from the above described Gaussian distribution
2. Make a scatter plot by drawing 100 items from a mixture distribution 0.
0. .

100 samples drawn from the above described Mixture Distribution

Reviews

There are no reviews yet.

Be the first to review “CS760 – HOMEWORK1 (Solution)”

CS760 – HOMEWORK1 (Solution)

Description

Reviews

Related products

CS760 – HOMEWORK8 (Solution)

CS760 – HOMEWORK5 (Solution)

CS760 – HOMEWORK4 (Solution)

CS760 – HOMEWORK2 (Solution)