Department of Computer Engineering CSE 654 / 484 (Solution)

$ 20.99
Category:

Description

Homework 02

In this homework we will develop a statistical language model of Turkish that will use N-grams of Turkish letters

Follow the steps below for the rest of the homework and for your homework report
1. Download the Turkish Wikipedia dump https://www.dropbox.com/sh/umigczctv1y50ss/AADUYl9YXbaqhCnEw4uUZi_5a?dl=0
3. Calculate perplexity of the 1-Gram to 5-Gram models using the chain rule with the Markov assumption for each sentence. You will use the remaining 5% of the set for these calculations. Make a table of your findings in your report and explain your results.
4. Produce random sentences for each N-Gram model. You should pick one of the best 5 letters
randomly. Include these random sentences in your report and discuss the produced sentences.

Notes
1. Do not forget to use logarithm of the multiplication of the chain rule formula
2. Convert all the letters to small case letters first.
3. Do not include any punctuation marks in your N-grams. Just lower case letters and space character will be enough.
4. You will demo your homework result online

Reviews

There are no reviews yet.

Be the first to review “Department of Computer Engineering CSE 654 / 484 (Solution)”

Your email address will not be published. Required fields are marked *