10601 – HOMEWORK 7 Solved

Description

HIDDEN MARKOV MODELS AND BAYES NETS
TAs: Weyxin, Kevin, Joseph, Mukund, Brendon
Summary In this assignment you will implement a new named entity recognition system using Hidden Markov Models. You will begin by going through some multiple choice and short answer warm-up problems to build your intuition for these models and then use that intuition to build your own HMM models.
START HERE: Instructions
˜mgormley/courses/10601/syllabus.html
• Late Submission Policy: See the late submission policy here: http://www.cs.cmu.edu/ ˜mgormley/courses/10601/syllabus.html
• Submitting your work: You will use Gradescope to submit answers to all questions and code. Please follow instructions at the end of this PDF to correctly submit all your code to Gradescope.
– Programming: You will submit your code for programming questions on the homework to
Gradescope (https://gradescope.com). After uploading your code, our grading scripts will autograde your assignment by running your program on a virtual machine (VM). When you are developing, check that the version number of the programming language environment (e.g. Python 3.9.6, OpenJDK 11.0.11, g++ 7.5.0) and versions of permitted libraries (e.g. numpy 1.21.2 and scipy 1.7.1) match those used on Gradescope. You have 10 free Gradescope programming submissions. After 10 submissions, you will begin to lose points from your total programming score. We recommend debugging your implementation on your local machine (or
1
the Linux servers) and making sure your code is running correctly first before submitting your code to Gradescope.
• Materials: The data that you will need in order to complete this assignment is posted along with the writeup and template on the course website.
EJML for Java EJML is a pure Java linear algebra package with three interfaces. We strongly recommend using the SimpleMatrix interface. The autograder will use EJML version 0.41. When compiling and running your code, we will add the additional command line argument -cp “linalg_lib/ejml-v0.41-libs/*:linalg_lib/nd4j-v1.0.0-M1.1-libs/*:./” to ensure that all the EJML jars are on the classpath as well as your code.
ND4J for Java ND4J is a library for multidimensional tensors with an interface akin to Python’s NumPy. The autograder will use ND4J version 1.0.0-M1.1. When compiling and running your code, we will add the additional command line argument
-cp “linalg_lib/ejml-v0.41-libs/*:linalg_lib/nd4j-v1.0.0-M1.1-libs/*:./” to ensure that all the ND4J jars are on the classpath as well as your code.
Eigen for C++ Eigen is a header-only library, so there is no linking to worry about—just #include whatever components you need. The autograder will use Eigen version 3.4.0. The command line arguments above demonstrate how we will call you code. When compiling your code we will include, the argument -I./linalg_lib in order to include the linalg_lib/Eigen subdirectory, which contains all the headers.
We have included the correct versions of EJML/ND4J/Eigen in the linalg_lib.zip posted on the Coursework page of the course website for your convenience. It contains the same linalg_lib/ directory that we will include in the current working directory when running your tests. Do not include EJML, ND4J, or Eigen in your homework submission; the autograder will ensure that they are in place.
ahttps://ejml.org bhttps://javadoc.io/doc/org.nd4j/nd4j-api/latest/index.html chttp://eigen.tuxfamily.org/
Instructions for Specific Problem Types
For “Select One” questions, please fill in the appropriate bubble completely:
Select One: Who taught this course?
Matt Gormley / Henry Chai
# Marie Curie
# Noam Chomsky
Select One: Who taught this course?
Matt Gormley / Henry Chai
# Marie Curie
@ Noam Chomsky
For “Select all that apply” questions, please fill in all appropriate squares completely:
Select all that apply: Which are scientists?
2 Stephen Hawking ■ Albert Einstein
Isaac Newton 2 None of the above
Select all that apply: Which are scientists?
■ Stephen Hawking
■ Albert Einstein ■ Isaac Newton
@■ I don’t know
Fill in the blank: What is the course number?
10-S7601S
1 Written Questions (50 points)
1.1 Hidden Markov Models
1. (4 points) In this section, we will test your understanding of several aspects of HMMs. Shade in the box or circle in the template document corresponding to the correct answer(s) for each of the questions below.
(a) (2 points) (Select all that apply) Let Yt be the state at time t. Which of the following are true under the (first-order) Markov assumption in an HMM:
The states are independent
The observations are independent
Yt ⊥⊥ Yt−1 | Yt−2
Yt ⊥⊥ Yt−2 | Yt−1 2 None of the above
(b) (2 points) (Select all that apply) Which of the following independence assumptions hold in an
HMM:
The current observation Xt is conditionally independent of all other observations given the current state Yt
The current observation Xt is conditionally independent of all other states given the current state Yt
The current state Yt is conditionally independent of all states given the previous state
The current observation Xt is conditionally independent of Yt−2 given the previous observation Xt−1 2 None of the above
2. (8 points) In the remaining questions, you will always see two quantities and decide what is the strongest relation between them. There is only one correct answer. Use the following definitions: αt(sj) = P(Yt = sj,x1:t) and βt(sj) = P(xt+1:T | Yt = sj).
Note: ? means it’s not possible to assign any true relation.
(a) (3 points) (Select one) Let N denote the number of possible hidden states. In other words, each
. What is the relation between ? Select only the
strongest relation that necessarily holds.
#
=
#
>
#
< ≤
≥ # ?
(b) (3 points) (Select one) What is the relation between P(Y4 = s1,Y5 = s2,x1:T ) and α4(s1)β5(s2)? Select only the strongest relation that necessarily holds.
#
=
#
>
#
< ≤
≥ # ?
(c) (2 points) (Select one) What is the relation between α5(si) and β5(si)? Select only the strongest relation that necessarily holds.
#
=
#
>
#
< ≤
#
≥
# ?
1.2 Graphical Models
In the Kingdom of Westeros, Summer has come. Jon Snow, the King in the North, has taken the responsibility to defeat the Giants and protect the realm.
If Jon Snow can get Queen Cersei and Daenerys Queen of the Dragons to help him Jon is likely to beat the giants. Cersei and Daenerys are powerful women who are skeptical of the existence of Giants and will most likely only consider joining Jon if the are shown evidence of an imminent Giant attack. They can only be shown of an attack if Jon captures a live Giant.
The Bayes net that represents the relationship between the events described above is shown below. Use the following notation for your variables: Jon Snow captures a live Giant (X1), Jon shows Censei and Daenerys a live Giant (X2), Cersei agrees to help (X3), Daenerys agrees to help (X4) and Giants defeated (X5).

1. (1 point) Write down the factorization of the above directed graphical model.

2. (1 point) Each random variable represented in the above Bayesian network is binary valued (i.e. either the event happens or it does not). If we use the same conditional independence assumptions made in part (1), state the minimum number of parameters you need to fully specify this Bayesian network.

3. (1 point) If we didn’t use the conditional independence assumptions made in part (1), what would be the minimum number of parameters we would need to model any joint distribution over the same set of random variables?

4. (5 points) For the following questions fill in the blank with the smallest set S of random variables needed to be conditioned on in order for the independence assumption to hold. For example Xi ⊥ Xj | S. What is the smallest set S that makes this statement true? The empty set ∅ is a valid answer, additionally if the independence assumption cannot be satisfied no matter what we condition on then your answer should be ’Not possible’.
(a) (1 point) X1 ⊥ X3 |
(b) (1 point) X1 ⊥ X5 |
(c) (1 point) X2 ⊥ X4 |
(d) (1 point) X3 ⊥ X4 |
(e) (1 point) X2 ⊥ X5 |
5. (6 points) Jon gets his friend Sam to calculate some estimates of his chances. Sam returns to Jon with the following conditional probabilities tables:
X1 = 0 0.3
X1 = 1 0.7
X1 = 0 X1 = 1
X2 = 0 0.8 0.25
X2 = 1 0.2 0.75
X2 = 0 X2 = 1
X3 = 0 0.5 0.6
X3 = 1 0.5 0.4
X2 = 0 X2 = 1
X4 = 0 0.3 0.2
X4 = 1 0.7 0.8
X3 = 0,X4 = 0 X3 = 0,X4 = 1 X3 = 1,X4 = 0 X4 = 1,X3 = 1
X5 = 0 0.4 0.7 0.8 0.5
X5 = 1 0.6 0.3 0.2 0.5
Table 1: Sam’s Conditional Probability tables
Using the conditional probabilities in Table 1 for our graphical model, compute the following (Your answers should be given to 2 decimal places):
(a) (2 points) P(X1 = 0,X2 = 1,X3 = 0,X4 = 1,X5 = 0).

1.3 Viterbi Decoding
1. (4 points) Suppose we have a set of sequences consisting of T observed states, x1,…,xT , where each xt ∈ {1,2,3}. Each observed state is associated with a hidden state Yt ∈ {C,D}. Let s1 = C and s2 = D.
In the Viterbi algorithm, we seek to find the most probable hidden state sequence yˆ1,…,yˆT given the observations x1,…,xT .
We define:
• B is the transition matrix: Bjk = P(Yt = sk | Yt−1 = sj)
• A is the emission matrix: Ajk = P(Xt = k | Yt = sj)
• π describes Y1’s initialization probabilities: πj = P(Y1 = sj)
• ωt(sk) is the maximum product of all the probabilities taken through path Y1,…,Yt−1 that ends with Yt at state sk.
(1)
• bt(sk) are the backpointers that store the path through hidden states that give us the highest product.
bt(sk) = argmax P(x1:t,y1:t−1,Yt = sk) (2)
y1,…,yt−1
We outline the Viterbi Algorithm below:
1. Initialize ω1(sj) = πjAjx1 and b1(j) = j 2. For t > 1, we have

bt(sj) = argmax AjxtBkjωt−1(sk)
k∈{1,…,J}
We can obtain the most probable sequence by backtracing through the backpointers as follows:
1. yˆT = argmaxk∈{1,…,J} ωT (sk).
2. For t = T − 1,…,1:
yˆt = bt+1(yˆt+1)
3. Return yˆ1,…,yˆT
For the following question, consider the Hidden Markov Model specified below. Working with probabilities in the logarithm scale, we have that lnP(Y1 = C) = lnP(Y1 = D) = −0.69, and the state transition model and emission probability tables are given as follows.

k lnP(Xt = k | Yt = C) lnP(Xt = k | Yt = D)
1 -0.69 -1.21
2 -0.91 -0.69
3 -2.30 -1.61
We observed X1 = 1 and X2 = 2. (Note that taking the maximum of log probabilities will give you the same result as taking the maximum of probabilities as log is a monotonically increasing function.)
(a) (2 points) Compute lnω1(C) and lnω1(D). If your answers involves decimal numbers, please round your answer to TWO decimal places.
What is lnω1(C)?

(b) (2 points) (Select one) Which of the following is the most likely sequence of hidden states?
#
Y1 = C, Y2 = C
#
Y1 = D, Y2 = D
Y1 = D, Y2 = C ## Not enough information. Y1 = C, Y2 = D
1.4 Forward-Backward Algorithm
1. (14 points) To help you prepare to implement the HMM forward-backward algorithm (see Section 2.5 for a detailed explanation), we have provided a small example for you to work through by hand. This toy data set consists of a training set of three sequences with three unique words and two tags and a validation set with a single sequence composed of the same unique words used in the training set.
Training set:
you D
eat C
fish D
you D
fish D
eat C
eat C
fish D
Where the training word sequences are:
x(1) = [you eat fish]T x(2) = [you fish eat]T x(3) = [eat fish]T
And the corresponding tags are:
y(1) = [D C D]T y(2) = [D D C]T y(3) = [C D]T
Validation set:
fish eat you
Where the validation word sequences are:
xvalidation = fish eat youT
In this question, we define
• Each observed state xt ∈ {1,2,3}, where 1 corresponds to you, 2 corresponds to eat, and 3 corresponds to fish
• Each hidden state Yt ∈ {C,D}. Let s1 = C and s2 = D.
• B is the transition matrix, where Bjk = P(Yt = sk | Yt−1 = sj). Note, here B is a 2 × 2 matrix.
• A is the emission matrix, where Ajk = P(Xt = k | Yt = sj). Note, here A is a 2 × 3 matrix. As an example, B23 denotes P(Xt = 3 | Yt = s2), or the probability Xt corresponds to fish given the hidden state Yt = D
• π describes Y1’s initialization probabilities: πj = P(Y1 = sj)
• αt(j) = P(Yt = sj,x1:t), which can be computed recursively:
1. α1(j) = πjAjx1.
2. For
• βt(j) = P(xt+1:T | Yt = sj), which can be computed recursively:
1. βT (j) = 1
2. For
The following four questions are meant to encourage you to work through the forward backward algorithm by hand using this validation example. Feel free to use a calculator, being careful to carry enough significant figures through your computations to avoid rounding errors. For each question below, please report the requested value in the text box under the question (these boxes are only visible in the template document).
Note: pseudocounts used in section 2.4 should also be used here.
(a) (5 points) Compute α2(C), the α value associated with the tag “C” for the second word in the validation sequence. Please round your answer to TWO decimal places.

(b) (3 points) Compute β2(D), the β value associated with the tag “D” for the second word in the validation sequence. Please round your answer to TWO decimal places.

(d) (3 points) Compute the log-likelihood for the entire validation sequence, “fish eat you”. Please round your answer to TWO decimal places.

2. (6 points) Return to these questions after implementing your learnhmm.{py|java|cpp} and forwardbackward.{py|java|cpp} functions. Please ensure that you have used the log-sum-exp trick in your programming as described in section 3.3 before answering these empirical questions.
Using the full data set train.txt in the handout, use your implementation of learnhmm.{py|java|cpp} to learn parameters for an hmm model using the first 10, 100, 1000, and 10000 sequences in the file. Use these learned parameters to perform prediction on the English train.txt and the validation.txt files using your forwardbackward.{py|java|cpp}. Construct a plot with number of sequences used for training on the x-axis (log-scale with base e) and average log likelihood across all sequences from the English train.txt and the validation.txt on the y-axis (see Section 2.5 for details on computing the log data likelihood for a sequence). Each table entry is worth 0.5 points. Write the resulting log likelihood values rounded to two decimal places in the table in the template. Include your plot in the large box in the template (2 points). To receive credit for your plot, you must submit a computer generated plot. DO NOT hand draw your plot.
(a) (4 points) Fill in this table.
# Sequences Train Average Log-
Likelihood Validation Average
Log-Likelihood
10 ?? ??
100 ?? ??
1000 ?? ??
10000 ?? ??
(b) (2 points) Put your plot below:

2 Programming (80 points)
2.1 The Task
In the programming section you will implement a named entity recognition system using Hidden Markov Models (HMMs). Named entity recognition (NER) is the task of classifying named entities, typically proper nouns, into pre-defined categories, such as person, location, organization. Consider the example sequence below, where each word is appended with a tab and then its tag:
“ O
Rhinestone B-ORG
Cowboy I-ORG
” O
( O
Larry B-PER
Weiss I-PER
) O
– O
3:15 O
2.2 The Dataset
WikiANN is a “silver standard” dataset that was generated without human labelling. The English Abstract Meaning Representation (AMR) corpus and DBpedia features were used to train an automatic classifier to label Wikipedia articles. These labels were then propagated throughout other Wikipedia articles using the Wikipedia’s cross-language links and redirect links. Afterwards, another tagger that self-trains on the existing tagged entities was used to label all other mentions of the same entities, even those with different morphologies (prefixes and suffixes that modify a word in other languages). Finally, the amassed training examples were filtered by “commonness” and “topical relatedness” to pick more relevant training data.
The WikiANN dataset provides labelled entity data for Wikipedia articles in 282 languages. We will be primarily using the English subset, which contains 14,000 training examples and 3,300 test examples, and the French subset, which contains around 7,500 training examples and 300 test examples. On a technical level, the main task is to implement an algorithm to learn the HMM parameters given the training data and then implement the forward backward algorithm to perform a smoothing query which we can then use to predict the hidden tags for a sequence of words.
2.3 File Formats
The handout files for this assignments contains several files that you will use in the homework. The contents and formatting of each of these files is explained below.
1. train.txt This file contains labeled text data that you will use in training your model in the Learning problem. Specifically the text contains one word per line that has already been preprocessed, cleaned and tokenized. Every sequence has the following format:
<Word0> <Tag0> <Word1> <Tag1> … <WordN> <TagN>
where every <WordK> <TagK> unit token is separated by a newline. Between each sequence is an empty line. If we have two three-word sequences in our data set, the data will look like so:
<Word0> <Tag0>
<Word1> <Tag1>
<Word2> <Tag2>
<Word0> <Tag0>
<Word1> <Tag1>
<Word2> <Tag2>
Note: Word 2 of the second sequence does not end with a newline because it is the end of the data set.
2. validation.txt: This file contains labeled data that you will use to evaluate your model in the Experiments section. This file has the same format as train.txt.
3. index to word.txt and index to tag.txt: These files contain a list of all words or tags that appear in the data set. The format is simple:
index to word.txt index to tag.txt
<Word0> <Tag0>
<Word1> <Tag1>
<Word2> <Tag2>
… …
In your functions, you will convert the string representation of words or tags to indices corresponding to the location of the word or tag in these files. For example, if Austria is on line 729 of index to word.txt, then all appearances of Austria in the data sets should be converted to the index 729. This index will also correspond to locations in the parameter matrices. For example, the word Austria corresponds to the parameters in column 729 of the matrix stored in hmmemit.txt. This will be useful for your forward-backward algorithm implementation (see Section 2.5).
4. predicted.txt: This file contains labeled data that you will use to debug your implementation. The labels in this file are generated by running our decoder using the features from train.txt. This file has the same format as train.txt.
5. metrics.txt: This file contains the metrics you will compute for the validation data. The first line should contain the average log likelihood, and the second line should contain the prediction accuracy. There should be a single space after the colon preceding the metric value; see the reference output file for more detail.
6. hmmtrans.txt, hmmemit.txt and hmminit.txt: These files contain pre-trained model parameters of an HMM that you will use in testing your implementation of the Learning and Evaluation and Decoding problems. The format of the first two files are analogous and is as follows. Every line in these files consists of a conditional probability distribution. In the case of transition probabilities, this distribution corresponds to the probability of transitioning into another state, given a current state. Similarly, in the case of emission probabilities, this distribution corresponds to the probability of emitting a particular symbol, given a current state. The formats of these files are:
hmmtrans.txt:
<ProbS1S1> <ProbS1S2> … <ProbS1SN>
<ProbS2S1> <ProbS2S2> … <ProbS2SN> …
hmmemit.txt:
<ProbS1Word1> <ProbS1Word2> … <ProbS1WordN>
<ProbS2Word1> <ProbS2Word2> … <ProbS2WordN> …
In both cases above, elements in the same row are separated by white space. Each row corresponds to a line of text, using to create new lines.
hmminit.txt:
<ProbS1>
<ProbS2> …
The format of hmminit.txt is similarly defined except that it only contains a single probability distribution over starting states. Therefore each row only has a single element.
Note that the data provided to you is to help in developing your implementation of the HMM algorithms. Your code will be tested on Gradescope using different subsets of data with identical formatting.
2.4 Learning
Your first task is to implement an algorithm to learn the hidden Markov model parameters needed to apply the forward backward algorithm (See Section 2.5). There are three sets of parameters that you will need to estimate: the initialization probabilities π, the transition probabilities B, and the emission probabilities A. For this assignment, we model each of these probabilities using a multinomial distribution with parameters πj = P(Y1 = sj), Bjk = P(Yt = sk | Yt−1 = sj), and Ajk = P(Xt = k | Yt = sj). These can be estimated using maximum likelihood, which results in the following parameter estimators:
1. , where NY1=sj equals the number of times state sj is associated with the first word of a sentence in the training data set.
2. , where NYt=sk,Yt−1=sj is the number of times state sj is followed by state sk in the training data set.
3. , where NXt=k,Yt=sj is the number of times that the state sj is associated with the word k in the training data set.
Note that for each count, a “+1” is added to make a pseudocount. This is slightly different from pure maximum likelihood estimation, but it is useful in improving performance when evaluating unseen cases during evaluation of your validation set.
You should implement a function that reads in the training data set (train.txt), and then estimates π, A, and B using the above maximum likelihood solutions.
Your outputs should be in the same format as hmminit.txt, hmmtrans.txt, and hmmemit.txt (including the same number of decimal places to ensure there are no rounding errors during prediction). The autograder will use the following commands to call your function:
For Python: $ python3 learnhmm.py [args…]
For Java: $ javac -cp “./lib/ejml-v0.33-libs/*:./” learnhmm.java; java -cp “./lib/ejml-v0.33-libs/*:./” learnhmm [args…]
For C++: $ g++ -g -std=c++11 -I./lib learnhmm.cpp; ./a.out [args…]
Where [args…] is a placeholder for six command-line arguments:<traininput><indextoword> <indextotag><hmminit><hmmemit><hmmtrans>. These arguments are described below:
1. <traininput>: path to the training input .txt file (see Section 2.2)
2. <index toword>: path to the .txt that specifies the dictionary mapping from words to indices. The tags are ordered by index, with the first word having index of 0, the second word having index of 1, etc.
3. <index totag>: path to the .txt that specifies the dictionary mapping from tags to indices. The tags are ordered by index, with the first tag having index of 0, the second tag having index of 1, etc.
4. <hmminit>: path to output .txt file to which the estimated initialization probabilities (π) will be written. The file output to this path should be in the same format as the handout hmminit.txt (see Section 2.2).
5. <hmmemit>: path to output .txt file to which the emission probabilities (A) will be written. The file output to this path should be in the same format as the handout hmmemit.txt (see Section 2.2)
6. <hmmtrans>: path to output .txt file to which the transition probabilities (B) will be written. The file output to this path should be in the same format as the handout hmmtrans.txt (see Section 2.2).
2.5 Evaluation and Decoding
2.5.1 Forward Backward Algorithm and Minimal Bayes Risk Decoding
Your next task is to implement the forward-backward algorithm. Suppose we have a set of sequence consisting of T words, x1,…,xT . Each word is associated with a label Yt ∈ {1,…,J}. In the forward-backward algorithm we seek to approximate P(Yt | x1:T ) up to a multiplication constant. This is done by first breaking P(Yt | x1:T ) into a “forward” component and a “backward” component as follows:
P(Yt = sj | x1:T ) ∝ P(Yt = sj,xt+1:T | x1:t)
∝ P(Yt = sj | x1:t)P(xt+1:T | Yt = sj,x1:t)
∝ P(Yt = sj | x1:t)P(xt+1:T | Yt = sj) ∝ P(Yt = sj,x1:t)P(xt+1:T | Yt = sj)
where P(Yt = sj | x1,…,xt) is computed by passing forward recursively through the model and P(xt+1,…,xT | Yt = sj) is computed by passing recursively backwards through the model.
Forward Algorithm
Define αt(j) = P(Yt = sj,x1:t). We can rearrange our definition of αt(j) as follows:
αt(j) =P(Yt = sj,x1:t)
=XP(Yt = sj,Yt−1 = sk,x1:t)
k
=XP(Yt−1 = sk,x1:t | Yt = sj)P(Yt = sj)
k
=XP(xt | Yt = sj)P(Yt−1 = sk,x1:t−1 | Yt = sj)P(Yt = sj)
k
=P(xt | Yt = sj)XP(Yt = sj,Yt−1 = sk,x1:t−1)
k
=P(xt | Yt = sj)XP(Yt = sj,x1:t−1 | Yt−1 = sk)P(Yt−1 = sk)
k
=P(xt | Yt = sj)XP(Yt = sj | Yt−1 = sk)P(x1:t−1 | Yt−1 = sk)P(Yt−1 = sk)
k
=P(xt | Yt = sj)XP(Yt = sj | Yt−1 = sk)P(Yt−1 = sk,x1:t−1)
k
=Ajxt XBkjαt−1(k) (3)
k
Using this definition, the α’s can be computed using the following recursive procedure:
1. α1(j) = πjAjx1.
2. For
Backward Algorithm Define βt(j) = P(xt+1:T | Yt = sj). We can rearrange our definition of βt(j) as follows:
βt(j) = P(xt+1:T | Yt = sj)
J
= XP(Yt+1 = sk,xt+1:T | Yt = sj)
k=1
J
= XP(xt+1:T | Yt = sj,Yt+1 = sk)P(Yt+1 = sk | Yt = sj)
k=1
J
= XP(xt+1 | Yt+1 = sk)P(xt+2:T | Yt+1 = sk)P(Yt+1 = sk | Yt = sj)
k=1
J
= XAkxt+1βt+1(k)Bjk (4)
k=1
Just like the α’s, the β’s can also be computed using the following backward recursive procedure:
1. βT (j) = 1 (All states could be ending states)
2. For (Generate xt+1 from any state)
Forward-Backward Algorithm As stated above, the goal of the Forward-Backward algorithm is to compute P(Yt = sj | x1:T ). This can be done using the following equation:
P(Yt = sj | x1:T ) ∝ P(Yt = sj,x1:t)P(xt+1:T | Yt = sj)
After running your forward and backward passes through the sequence, you are now ready to estimate the conditional probabilities as:
P(Yt | x1:t) ∝ αt ◦ βt
where ◦ is the element-wise product.
Minimum Bayes Risk Prediction We will assign tags using the minimum Bayes risk predictor, defined for this problem as follows:
Yˆt = argmax
j 1,…,J
To resolve ties, select the tag that appears earlier in the <indextotag> input file.
Computing the Log Likelihood of a Sequence When we compute the log likelihood of a sequence, we are interested in the computing the quantity log(P(x1:T )). We can rewrite this in terms of values we have already computed in the forward-backward algorithm as follows:

2.5.2 Implementation Details
You should now implement your forward-backward algorithm as a program, forwardbackward.{py|java|cpp}. The program will read in validation data and the parameter files produced by learnhmm.{py|java|cpp}. The autograder will use the following commands to call your function:
For Python: $ python3 forwardbackward.py [args…]
For Java: $ javac -cp “./lib/ejml-v0.33-libs/*:./” forwardbackward.java;
$ java -cp “./lib/ejml-v0.33-libs/*:./” forwardbackward [args…]
For C++: $ g++ -g -std=c++11 -I./lib forwardbackward.cpp; ./a.out [args…]
Where above [args…] is a placeholder for seven command-line arguments:<validationinput>
<indextoword><indextotag><hmminit><hmmemit><hmmtrans><predictedfile>
<metricfile>. These arguments are described in detail below:
1. <validationinput>: path to the validation input .txt file that will be evaluated by your forward backward algorithm (see Section 2.2)
2. <index toword>: path to the .txt that specifies the dictionary mapping from words to indices.
The tags are ordered by index, with the first word having index of 0, the second word having index of 1, etc. This is the same file as was described for learnhmm.{py|java|cpp}.
3. <index totag>: path to the .txt that specifies the dictionary mapping from tags to indices. The tags are ordered by index, with the first tag having index of 0, the second tag having index of 1, etc. This is the same file as was described for learnhmm.{py|java|cpp}.
4. <hmminit>: path to input .txt file which contains the estimated initialization probabilities (π).
5. <hmmemit>: path to input .txt file which contains the emission probabilities (A).
6. <hmmtrans>: path to input .txt file which contains transition probabilities (B).
7. <predictedfile>: path to the output .txt file to which the predicted tags will be written. The file should be in the same format as the <validationinput> file.
8. <metric file>: path to the output .txt file to which the metrics will be written.
Example command for python users:
$ python3 forwardbackward.py toy_data/validation.txt toy_data/index_to_word.txt toy_data/index_to_tag.txt toy_data/hmminit.txt toy_data/toy_hmmemit.txt toy_data/hmmtrans.txt toy_data/predicted.txt toy_data/metrics.txt
After running the command above, the <predictedfile> output should be:
fish D
eat C
you D
And the <metricfile> output should be:
Average Log-Likelihood: -3.0438629330222424 Accuracy: 0.3333333333333333
where average log-likelihood and accuracy are evaluated over the validation set.
Take care that your output has the exact same format as shown above. There should be a single space after the colon preceding the metric value (e.g. a space after Average Log-Likelihood:). Each line should be terminated by a Unix line ending .
2.5.3 Log-Space Arithmetic for Avoiding Underflow
For this homework, using log-space starts with transforming Eq.(3) and Eq.(4) into logarithmic form (how to do that is straightforward and left as an exercise). Please use e as the base for logarithm calculation (natural log).
logXexp(vi)
i

Algorithm 1 Log-Sum-Exp Trick

1: procedure LOGSUMEXPTRICK((v1,v2,··· ,vn))
2: m = max(vi) for i = {1,2,··· ,n}
3: return m + log(Pi exp(vi − m))

Note: The autograder test cases account for numerical underflow using the Log-Sum-Exp Trick. If you do not implement forwardbackward.py with the trick, you might only receive partial credit.
2.6 Gradescope Submission
You should submit your learnhmm.{py|java|cpp} and forwardbackward.{py|java|cpp} to Gradescope. Note: please do not use other file names. This will cause problems for the autograder to correctly detect and run your code.
Some additional tips: Make sure to read the autograder output carefully. The autograder for Gradescope prints out some additional information about the tests that it ran. For this programming assignment we’ve specially designed some buggy implementations that you might do, and try our best to detect those and give you some more useful feedback in Gradescope’s autograder. Make wise use of autograder’s output for debugging your code.
3 Collaboration Questions
1. Did you receive any help whatsoever from anyone in solving this assignment? If so, include full details.
2. Did you give any help whatsoever to anyone in solving this assignment? If so, include full details.
3. Did you find or come across code that implements any part of this assignment ? If so, include full details.

Reviews

There are no reviews yet.

Be the first to review “10601 – HOMEWORK 7 Solved”

10601 – HOMEWORK 7 Solved

Description

Reviews

Related products

10601 – HOMEWORK 2 Solved

10601 – HOMEWORK 2: DECISION TREES Solved

10601 – HOMEWORK 2 Solved

10601 – Homework1 Solved

10601 – HOMEWORK1 Solved