CS760 – HOMEWORK2 (Solution)

$ 20.99
Category:

Description

>>NAME HERE<<
>>ID HERE<<
Instructions: Although this is a programming homework, you only need to hand in a pdf answer file. There is no need to submit the latex source or any code. You can choose any programming language, as long as you implement the algorithm from scratch (e.g. do not use sklearn or any other library’s decision tree implmentation on questions 1 to 7).
Use this latex file as a template to develop your homework. Submit your homework on time as a single pdf file to Canvas. Please check Piazza for updates about the homework.
1 A Simplified Decision Tree
You are to implement a decision-tree learner for classification. To simplify your work, this will not be a general purpose decision tree. Instead, your program can assume that
• each item has two continuous features x∈R2
• the class label is binary and encoded as y ∈{0,1}
• data files are in plaintext with one labeled item per line, separated by whitespace:
x11 x12 y1

xn1 xn2 yn
Your program should implement a decision tree learner according to the following guidelines:
• Candidate splits (j,c) for numeric features should use a threshold c in feature dimension j in the form of x·j ≥ c.
• The left branch of such a split is the “then” branch, and the right branch is “else”.
• The stopping criteria (for making a node into a leaf) are that
– the node is empty, or
– all splits have zero gain ratio (if the entropy of the split is non-zero), or
– the entropy of any candidate split is zero
• To simplify, whenever there is no majority class in a leaf, let it predict y = 1.
Homework 2 CS 760 Machine Learning

2 Questions
2. (Our algorithm is greedy) [5 pts] Handcraft a small training set where both classes are present but the algorithm refuses to split; instead it makes the root a leaf and stop; Importantly, if we were to manually force a split, the algorithm will happily continue splitting the data set further and produce a deeper tree with zero training error. You should (1) plot your training set, (2) explain why. Hint: you don’t need more than a handful of items.
5. (Or is it?) [20 pts] For this question only, make sure you DO NOT VISUALIZE the data sets or plot your tree’s decision boundary in the 2D x space. If your code does that, turn it off before proceeding. This is because you want to see your own reaction when trying to interpret a tree. You will get points no matter what your interpretation is. And we will ask you to visualize them in the next question anyway.
• Build a decision tree on D1.txt. Show it to us in any format (e.g. could be a standard binary tree with nodes and arrows, and denote the rule at each leaf node; or as simple as plaintext output where each line represents a node with appropriate line number pointers to child nodes; whatever is convenient for you). Again, do not visualize the data set or the tree in the x input space. In real tasks you will not be able to visualize the whole high dimensional input space anyway, so we don’t want you to “cheat” here.
• Look at your tree in the above format (remember, you should not visualize the 2D dataset or your tree’s decision boundary) and try to interpret the decision boundary in human understandable English.
• Build a decision tree on D2.txt. Show it to us.
• Try to interpret your D2 decision tree. Is it easy or possible to do so without visualization?
6. (Hypothesis space) [10 pts] For D1.txt and D2.txt, do the following separately:
• Produce a scatter plot of the data set.
• Visualize your decision tree’s decision boundary (or decision region, or some other ways to clearly visualize how your decision tree will make decisions in the feature space).
Then discuss why the size of your decision trees on D1 and D2 differ. Relate this to the hypothesis space of our decision tree algorithm.
7. (Learning curve) [20 pts] We provide a data set Dbig.txt with 10000 labeled items. Caution: Dbig.txt is sorted.
• You will randomly split Dbig.txt into a candidate training set of 8192 items and a test set (the rest). Do this by generating a random permutation, and split at 8192.
• Generate a sequence of five nested training sets D32 ⊂ D128 ⊂ D512 ⊂ D2048 ⊂ D8192 from the candidate training set. The subscript n in Dn denotes training set size. The easiest way is to take the first n items from the (same) permutation above. This sequence simulates the real world situation where you obtain more and more training data.
• For each Dn above, train a decision tree. Measure its test set error errn. Show three things in your answer: (1) List n, number of nodes in that tree, errn. (2) Plot n vs. errn. This is known as a learning curve (a single plot). (3) Visualize your decision trees’ decision boundary (five plots).
Homework 2 CS 760 Machine Learning

3 sklearn [10 pts]
Learn to use sklearn https://scikit-learn.org/stable/. Use sklearn.tree.DecisionTreeClassifier to produce trees for datasets D32,D128 …D8192. Show two things in your answer: (1) List n, number of nodes in that tree, errn. (2) Plot n vs. errn.
4 Lagrange Interpolation [10 pts]
Fix some interval [a,b] and sample n = 100 points x from this interval uniformly. Use these to build a training set consisting of n pairs (x,y) by setting function y = sin(x).
Build a model f by using Lagrange interpolation, as discussed in lecture. Generate a test set using the same distribution as your test set. Compute and report the resulting model’s train and test error. What do you observe?
Repeat the experiment with zero-mean Gaussian noise ε added to x. Vary the standard deviation for ε and report your findings.

Reviews

There are no reviews yet.

Be the first to review “CS760 – HOMEWORK2 (Solution)”

Your email address will not be published. Required fields are marked *