## Description

Le Song

• Please read the following submission rules carefully. We will strictly follow these rules starting from this assignment.

• We recommend writing your report with Latex. Other text editors such as MS Word are also accepted. Hand-written report must be clear. Unreadable handwriting is subject to zero credit. Report must be a single pdf file named with the format HWnumber your GT account.pdf. For example, “HW4 yyang123.pdf”. Any additional document is ignored.

• Matlab submissions will be graded in Matlab Online R2019b environment . Python submission will be graded in Python 3.6 environment on a regular laptop. Please make sure your program is runnable in those environments. Any program that cannot run is subject to zero credit. Situations include but not restricted to exceeding 5 minutes of running time, exceeding memory limit and invalid text character.

• Put your report and code files into a single flat folder and submit it as a single .zip file.

• Recommended reading: PRML Section 13.2

1 Kernels [20 points]

(a) Identify which of the followings is a valid kernel. If it is a kernel, please write your answerexplicitly as ‘True’ and give mathematical proofs. If it is not a kernel, please write your answer explicitly as ‘False’ and give explanations. [8 pts]

Suppose K1 and K2 are valid kernels (symmetric and positive definite) defined on Rm × Rm.

1. K(u,v) = αK1(u,v) + βK2(u,v),α,β ∈ R.

2. K(u,v) = K1(f(u),f(v)) where f : Rm → Rm. coefficients.

3.

(1 if ku − vk2 6 1

K(u,v) = (1)

0 otherwise

4. Suppose K0 is a valid kernel.

. (2)

(b) Write down kernelized version of Fisher’s Linear Discriminant Analysis using kernel trick.Please provide full steps and all details of the method. [Hint: Use kernel to replace inner products.] [12 pts]

2 Markov Random Field, Conditional Random Field [20 pts]

[a-b] A probability distribution on 3 discrete variables a,b,c is defined by ), where the table for the two factors are given below.

a b φ1(a,b)

0 0 4

0 1 3

1 0 3

1 1 1

b c φ2(b,c)

0 0 3

0 1 2

0 2 1

1 0 4

1 1 1

1 2 3

(a) Compute the slice of the joint factor ψ(a,b,c) corresponding to b = 1. This is the table ψ(a,b = 1,c). [5 pts]

(b) Compute P(a = 1,b = 1). [5 pts]

(c) Explain the difference between Conditional Random Fields and Hidden Markov Modelswith respect to the following factors. Please give only a one-line explanation. [10 pts]

• Type of model – generative/discriminative

• Objective function optimized

• Require a normalization constant

3 Hidden Markov Model [50 pts]

This problem will let you get familiar with HMM algorithms by doing the calculations by hand.

[a-c] There are three coins (1,2,3), to throw them randomly, and record the result. S = 1,2,3; V = H,T (Head or Tail); A,B,π is given as

1 2 3

1 0.9 0.05 0.05

2 0.45 0.1 0.45

3 0.45 0.45 0.1

1 2 3

H 0.5 0.75 0.25

T 0.5 0.25 0.75

A:B:

π : π 1/3 1/3 1/3

(a) Given the model above, what’s the probability of observation O = H,T,H. [10 pts]

(b) Describe how to get the A,B, and π, when they are unknown. [10 pts]

(c) In class, we studied discrete HMMs with discrete hidden states and observations. Thefollowing problem considers a continuous density HMM, which has discrete hidden states but continuous observations. Let St ∈ 1,2,…,n denote the hidden state of the HMM at time t, and let Xt ∈ R denote the real-valued scalar observation of the HMM at time t. In a continuous density HMM, the emission probability must be parameterized since the random variable Xt is no longer discrete. It is defined as P(Xt = x|St = i) = N(µi,σi2). Given m sequences of observations (each of length T), derive the EM algorithm for HMM with Gaussian observation model. [14 pts]

(d) For each of the following sentences, say whether it is true or false and provide a shortexplanation (one sentence or so). [16 pts]

• The weights of all incoming edges to a state of an HMM must sum to 1.

• An edge from state s to state t in an HMM denotes the conditional probability of going to state s given that we are currently at state t.

• The ”Markov” property of an HMM implies that we cannot use an HMM to model a process that depends on several time-steps in the past.

• The Baum-Welch algorithm is a type of an Expectation Maximization algorithm and as such it is guaranteed to converge to the (globally) optimal solution.

4 Programming [30 pts]

Consider a Hidden Markov Model in which xt denotes the economic state (good or bad) of week t and yt denotes the price movement (up or down) of the SP500 index. We assume that x(t+1) = xt with probability 0.8, and P(Yt|Xt)(yt = +1|xt = good) = P(Yt|Xt)(yt = −1|xt = bad) = q. In addition, assume that P(X1)(x1 = bad) = 0.8. Load the sp500.mat, implement the algorithm, briefly describe how you implement this and report the following :

(a) Assuming q = 0.7, plot P(Xt|Y)(xt = good|y) for t = 1,2,…,39. What is the probability that the economy is in a good state in the week of week 39. [15 pts]

(b) Repeat (a) for q = 0.9, and compare the result to that of (a). Explain your comparison in one or two sentences. [15 pts]

## Reviews

There are no reviews yet.