Description
Le Song
• Please read the following submission rules carefully. We will strictly follow these rules starting from this assignment.
• We recommend writing your report with Latex. Other text editors such as MS Word are also accepted. Hand-written report must be clear. Unreadable handwriting is subject to zero credit. Report must be a single pdf file named with the format HWnumber your GT account.pdf. For example, “HW4 yyang123.pdf”. Any additional document is ignored.
• Matlab submissions will be graded in Matlab Online R2019b environment . Python submission will be graded in Python 3.6 environment on a regular laptop. Please make sure your program is runnable in those environments. Any program that cannot run is subject to zero credit. Situations include but not restricted to exceeding 5 minutes of running time, exceeding memory limit and invalid text character.
• Put your report and code files into a single flat folder and submit it as a single .zip file.
• Recommended reading: PRML Section 13.2
1 Kernels [20 points]
(a) Identify which of the followings is a valid kernel. If it is a kernel, please write your answerexplicitly as ‘True’ and give mathematical proofs. If it is not a kernel, please write your answer explicitly as ‘False’ and give explanations. [8 pts]
Suppose K1 and K2 are valid kernels (symmetric and positive definite) defined on Rm × Rm.
1. K(u,v) = αK1(u,v) + βK2(u,v),α,β ∈ R.
2. K(u,v) = K1(f(u),f(v)) where f : Rm → Rm. coefficients.
3.
(1 if ku − vk2 6 1
K(u,v) = (1)
0 otherwise
4. Suppose K0 is a valid kernel.
. (2)
(b) Write down kernelized version of Fisher’s Linear Discriminant Analysis using kernel trick.Please provide full steps and all details of the method. [Hint: Use kernel to replace inner products.] [12 pts]
2 Markov Random Field, Conditional Random Field [20 pts]
[a-b] A probability distribution on 3 discrete variables a,b,c is defined by ), where the table for the two factors are given below.
a b φ1(a,b)
0 0 4
0 1 3
1 0 3
1 1 1
b c φ2(b,c)
0 0 3
0 1 2
0 2 1
1 0 4
1 1 1
1 2 3
(a) Compute the slice of the joint factor ψ(a,b,c) corresponding to b = 1. This is the table ψ(a,b = 1,c). [5 pts]
(b) Compute P(a = 1,b = 1). [5 pts]
(c) Explain the difference between Conditional Random Fields and Hidden Markov Modelswith respect to the following factors. Please give only a one-line explanation. [10 pts]
• Type of model – generative/discriminative
• Objective function optimized
• Require a normalization constant
3 Hidden Markov Model [50 pts]
This problem will let you get familiar with HMM algorithms by doing the calculations by hand.
[a-c] There are three coins (1,2,3), to throw them randomly, and record the result. S = 1,2,3; V = H,T (Head or Tail); A,B,π is given as
1 2 3
1 0.9 0.05 0.05
2 0.45 0.1 0.45
3 0.45 0.45 0.1
1 2 3
H 0.5 0.75 0.25
T 0.5 0.25 0.75
A:B:
π : π 1/3 1/3 1/3
(a) Given the model above, what’s the probability of observation O = H,T,H. [10 pts]
(b) Describe how to get the A,B, and π, when they are unknown. [10 pts]
(c) In class, we studied discrete HMMs with discrete hidden states and observations. Thefollowing problem considers a continuous density HMM, which has discrete hidden states but continuous observations. Let St ∈ 1,2,…,n denote the hidden state of the HMM at time t, and let Xt ∈ R denote the real-valued scalar observation of the HMM at time t. In a continuous density HMM, the emission probability must be parameterized since the random variable Xt is no longer discrete. It is defined as P(Xt = x|St = i) = N(µi,σi2). Given m sequences of observations (each of length T), derive the EM algorithm for HMM with Gaussian observation model. [14 pts]
(d) For each of the following sentences, say whether it is true or false and provide a shortexplanation (one sentence or so). [16 pts]
• The weights of all incoming edges to a state of an HMM must sum to 1.
• An edge from state s to state t in an HMM denotes the conditional probability of going to state s given that we are currently at state t.
• The ”Markov” property of an HMM implies that we cannot use an HMM to model a process that depends on several time-steps in the past.
• The Baum-Welch algorithm is a type of an Expectation Maximization algorithm and as such it is guaranteed to converge to the (globally) optimal solution.
4 Programming [30 pts]
Consider a Hidden Markov Model in which xt denotes the economic state (good or bad) of week t and yt denotes the price movement (up or down) of the SP500 index. We assume that x(t+1) = xt with probability 0.8, and P(Yt|Xt)(yt = +1|xt = good) = P(Yt|Xt)(yt = −1|xt = bad) = q. In addition, assume that P(X1)(x1 = bad) = 0.8. Load the sp500.mat, implement the algorithm, briefly describe how you implement this and report the following :
(a) Assuming q = 0.7, plot P(Xt|Y)(xt = good|y) for t = 1,2,…,39. What is the probability that the economy is in a good state in the week of week 39. [15 pts]
(b) Repeat (a) for q = 0.9, and compare the result to that of (a). Explain your comparison in one or two sentences. [15 pts]
Reviews
There are no reviews yet.