## Description

BACKGROUND

https://mlcourse.org

TAs: Ayushi, Scott, Filipp, Yiming

START HERE: Instructions

˜mgormley/courses/10601/about.html

• Late Submission Policy: See the late submission policy here: http://www.cs.cmu.edu/

˜mgormley/courses/10601/about.html

• Submitting your work:

– Programming: You will submit your code for programming questions on the homework to

Gradescope (https://gradescope.com). After uploading your code, our grading scripts will autograde your assignment by running your program on a virtual machine (VM). When you are developing, check that the version number of the programming language environment (e.g. Python 3.6.9, Octave 4.2.2, OpenJDK 11.0.5, g++ 7.4.0) and versions of permitted libraries (e.g. numpy 1.17.0 and scipy 1.4.1) match those used on Gradescope. (Octave users: Please make sure you do not use any Matlab-specific libraries in your code that might make it fail against our tests.) You have a total of 10 Gradescope programming submissions. Use them wisely. In order to not waste code submissions, we recommend debugging your implementation on your local machine (or the linux servers) and making sure your code is running correctly first before any Gradescope coding submission. The above is true for future assignments, but this one allows unlimited submissions.

• Materials: The data that you will need in order to complete this assignment is posted along with the writeup and template on Piazza.

For multiple choice or select all that apply questions, shade in the box or circle in the template document corresponding to the correct answer(s) for each of the questions. For LATEX users, replace choice with CorrectChoice to obtain a shaded box/circle, and don’t change anything else.

1 Programming: Decision Stump [30 Points]

1.1 Introduction

The goal of this assignment is to ensure that you:

1. Have a way to edit and test your code (i.e. a text editor and compiler/interpreter)

2. Are familiar with submitting to Gradescope

3. Are familiar with file I/O and standard output in the language of your choice

1.2 Decision Stump

1.2.1 Algorithm

This simple algorithm acts a precursor to the Decision Tree that you will implement in the next homework assignment. We hope that you will employ best practices when coding so that you can re-use your own code here in the next assignment.

At test time, each example should be passed down through the stump. Its label becomes the label (i.e. the stored majority vote) of the corresponding branch in which it lands.

1.2.2 The Datasets

Materials Download the zip file from Piazza, which contains all the data that you will need in order to complete this assignment.

Datasets The handout contains three datasets. Each one contains attributes and labels and is already split into training and testing data. The first line of each .tsv file contains the name of each attribute, and the class is always the last column.

1. politician: The first task is to predict whether a US politician is a member of the Democrat or Republican party, based on their past voting history. Attributes (aka. features) are short descriptions of bills that were voted on, such as Aid to nicaraguan contras or Duty free exports. Values are given as ‘y’ for yes votes and ‘n’ for no votes. The training data is in politicians_train.tsv, and the test data in politicians_test.tsv.

3. small: We also include small_train.tsv and small_test.tsv—a small, purely for demonstration version of the politicians dataset, with only attributes Anti satellite test ban and Export south africa.

The handout zip file also contains the predictions and metrics from a reference implementation of a Decision Stump for the politician (splitting on feature 3), education (splitting on feature 5) and small (splitting on feature 0) datasets (see subfolder example output). You can check your own output against these to see if your implementation is correct.

Note: For simplicity, all attributes are discretized into just two categories. This applies to all the datasets in the handout, as well as the additional datasets on which we will evaluate your Decision Stump.

1.2.3 Command Line Arguments

The autograder runs and evaluates the output from the files generated, using the following command:

For Python:

For Java:

For C++:

$ python decisionStump.py [args…]

$ javac decisionStump.java; java decisionStump [args…]

$ g++ -g decisionStump.cpp; ./a.out [args…]

$ octave -qH decisionStump.m [args…]

For Octave:

Where above [args…] is a placeholder for six command-line arguments: <train input> <test input> <split index> <train out> <test out> <metrics out>. These arguments are described in detail below:

1. <train input>: path to the training input .tsv file

2. <test input>: path to the test input .tsv file

3. <split index>: the index of feature at which we split the dataset. The first column has index 0, the second column index 1, and so on.

4. <train out>: path of output .labels file to which the predictions on the training data should be written

5. <test out>: path of output .labels file to which the predictions on the test data should be written

6. <metrics out>: path of the output .txt file to which metrics such as train and test error should be written

As an example, if you implemented your program in Python, the following command line would run your program on the politicians dataset and split the dateset by the first feature (Remember that the index of feature starts from zero). The train predictions would be written to pol_0_train.labels, the test predictions to pol_0_test.labels, and the metrics to pol_0_metrics.txt.

$ python decisionStump.py politicians_train.tsv politicians_test.tsv

0 pol_0_train.labels pol_0_test.labels pol_0_metrics.txt

1.2.4 Output: Labels Files

Your program should write two output .labels files containing the predictions of your model on training data (<train out>) and test data (<test out>). Each should contain the predicted labels for each example printed on a new line. Use ’ ’ to create a new line.

Your labels should exactly match those of a reference decision stump implementation—this will be checked by the autograder by running your program and evaluating your output file against the reference solution.

Note: You should output your predicted labels using the same string identifiers as the original training data:

e.g., for the politicians dataset you should output democrat/republican and for the education dataset you should output A/notA. The first few lines of an example output file is given below for the politician dataset:

republican republican democrat democrat democrat democrat democrat …

1.2.5 Output: Metrics File

Generate another file where you should report the training error and testing error. This file should be written to the path specified by the command line argument <metrics out>. Your reported numbers should be within 0.01 of the reference solution. You do not need to round your reported numbers! The Autograder will automatically incorporate the right tolerance for float comparisons. The file should be formatted as follows:

error(train): 0.241611 error(test): 0.228916

1.3 Command Line Arguments

In this and future programming assignments, we will use command line arguments to run your programs with different parameters. Below, we provide some simple examples for how to do this in each of the programming languages you can use in the course. In the examples below, suppose your program takes two arguments: an input file and an output file.

Python:

import sys

if __name__ == ’__main__’: infile = sys.argv[1] outfile = sys.argv[2] print(“Theinput file is: %s” % (infile)) print(“Theoutput file is: %s” % (output))

Java:

public class myclass { public static void main(String[] args) {

String infile = args[0];

String outfile = args[1];

System.out.println(“Theinput file is: ” + infile);

System.out.println(“Theoutput file is: ” + outfile);

}

}

C++:

#include <iostream> #include <string>

using namespace std;

int main(int argc, char **argv){ if (argc >= 3) {

string infile = string(argv[1]); string outfile = string(argv[2]); cout << “Theinput file is: ” << infile << endl; cout << “Theoutput file is: ” << outfile << endl;

} return 0;

}

Octave:

args=argv(); infile=args{1}; outfile=args{2}; disp(horzcat(“Theintput file is: “, infile)) disp(horzcat(“Theoutput file is: “, outfile))

1.4 Code Submission

You must submit a file named decisionStump.{py|m|java|cpp}. The autograder is case sensitive, so observe that all your files should be named in lowercase. You must submit this file to the corresponding homework link on Gradescope.

Instructions for Specific Problem Types

For “Select One” questions, please fill in the appropriate bubble completely:

Select One: Who taught this course?

Matt Gormley

Marie Curie

Noam Chomsky

Select One: Who taught this course?

Matt Gormley # Marie Curie

@ Noam Chomsky

For “Select all that apply” questions, please fill in all appropriate squares completely:

Select all that apply: Which are scientists?

2 Stephen Hawking Albert Einstein

Isaac Newton 2 None of the above

Select all that apply: Which are scientists?

Stephen Hawking

Albert Einstein Isaac Newton

@ I don’t know

Fill in the blank: What is the course number?

10-S7601S

2 Written Questions [70 pts]

In this section, you will work through a number of problems covering prerequisite material: probability, statistics, calculus, linear algebra, geometry, and computer science. The first subsection covers common course policy questions

2.1 Course Policies

This section covers important course policies that every student should know and understand. These questions MUST be finished in order for the whole homework to be considered for grading.

1. (1 point) Assignment turned in late without prior approval will incur a daily penalty. How much is the penalty? Up to 1 day: Up to 2 day: Up to 3 day: Up to 4 day:

Select one:

5%, 10%, 15%, 20%

10%, 20%, 30%, 40%

25%, 50%, 75%, 100%

20%, 40%, 60%, 80%

Select one:

As many as I want; Of course!

6; No

8; Yes

6; Yes

3. (1 point) Seeking help from other students in understanding course materials needed to solve homework problems is ALLOWED under which of the follow conditions?

Select all that apply:

No written notes are shared or taken at any time

Learning is facilitated not circumvented; i.e., the purpose of seeking help is to learn and understand the problem instead of merely getting an answer

Help both given and received is reported in collaboration questions in the homework

The student updates his/her collaborative questions even if it is after submitting their own assignment 2 None of the above

4. (1 point) Which of the following is (are) strictly forbidden in solving and submitting homework?

Select all that apply:

Searching on the internet for solutions or sample codes

Consulting people outside this class who have seen or solved the problem before

Turning in someone else’s homework

Using anyone else’s, or allowing other classmates to use your computer, Autolab or Gradescope in connection with this course

2 None of the above

5. (1 point) If you solved your assignment completely on your own, you can skip the collaboration questions at the end of each homework.

Select one:

True

False

6. (1 point) What is (are) the consequence(s) of being caught cheating in this course? Select all that apply.

First time:

A one and a half letter grade reduction (e.g., A becomes B-)

Second time:

Failure of the course

7. (1 point) Assume a difficult situation arises in the middle of the semester (e.g. medical, personal etc.) that might prevent you from submitting assignments on time or working as well as you would like. What should you do? Select all that apply

Talk to the course staff early so they can point you to the available resources on campus and make necessary arrangements

Hold on from talking to the course staff, try to finish the class, reach out to the course staff in the end of the semester explaining your special situation

Reach out to your advisor so that they are aware of the situation 2 None of the above

2.2 Probability and Statistics

Use the following data to answer questions 1-2. Consider data created by flipping a coin five times S =

[1, 1, 0, 1, 1] , where 1 denotes that the coin turned up heads and 0 denotes that it turned up tails.

1. (1 point) What is the probability of observing any combination of this data (4 heads and 1 tails), assuming it was generated by flipping a coin X with an unequal probability of heads (1) and tails (0), where the distribution is P(X = 1) = 0.75, P(X = 0) = 0.25?

Select one:

2. (1 point) Note that the probability of this data sample would be greater if the value of P(X = 1) was not 0.75, but instead some other value. What is the value of P(X = 1) that maximizes the probability of the sample S? Provide your answer as a fraction.

3. (1 point) State true or false. For events A and B,

P(A ∩ B) = P(A) + P(B) − P(A ∪ B)

Select one:

True

False

4. (1 point) State true or false. For events A and B,

P(A1 ∩ A2 ∩ A3) = P(A3|A2 ∩ A1)P(A2|A1)P(A1)

Select one:

True

False

P(R) = 0.4

P(W|R) = 0.8 P(W|¬R) = 0.2

What is the probability that your car is wet in the morning?

Select one:

ble where both X and Y are binary variables:

X Y Probability

0 0 0.1

0 1 0.2

1 0 0.4

1 1 0.3

6. (1 point) Using the same table, what is P(X = 1|Y = 1)?

Select one:

7. (1 point) What is P(Y = 0)?

Select one:

Use the following information to answer questions 8-10. Let X be a random variable and the ex-

pected value of X is E[X] = 1 and the variance of X is V ar[X] = 1.

8. (1 point) What is E[6X]?

Select one:

9. (1 point) What is V ar[3X]?

Select one:

10. (1 point) What is V ar[2X + 3]?

Select one:

Use the following information to answer questions 11-14: Let A, B, and C be random variables with

discrete probability distributions. Consider the following two joint probability tables: one relating A and B, and the other relating B and C.

AB b1 b2 b3

a1 0.1 0.05 0.15

a2 0.1 0.05 0.3

a3 0.05 0.15 0.05

BC c1 c2 c3 c4

b1 0.02 0.14 0.06 0.03

b2 0.03 0.05 0 0.17

b3 0.35 0.04 0 0.11

11. (1 point) Which of the following statements are necessarily false?

Select all that apply:

AB

BC AC

2 None of the above.

12. (2 points) What is P(B = b1|A = a2,C = c4)? If this value cannot be computed, write N/A.

13. (2 points) What is P(B = b2|A = a3,C = c3)? If this value cannot be computed, write N/A.

14. (2 points) True or False: P3i=1 P(B = bi|C = c1) = Pj4=1 P(C = cj|B = b1)

Select one:

True

False

15. (2 points) Consider two random variables X,Y . Assume that we have

(integers greater than or equal to 1) and . Assume n is a fixed positive integer constant. What is E[Y ]?

Select one:

16. (1 point) What is the mean, variance and entropy of a Bernoulli (p) random variable?

Select one:

The entropy of a Bernoulli variable is not defined.

17. (2 points) Please match the probability density function of the random variable X to its corresponding distribution name.

1. prob

2. prob(X = x) = λe−λx when x ≥ 0; 0 otherwise

3. prob

4. prob when a ≤ x ≤ b; 0 otherwise

5. prob(X = x) = px(1 − p)1−x

Multivariate Gaussian:

Exponential:

Uniform:

Bernoulli:

Binomial:

2.3 Calculus [8pts]

1. (2 points) Evaluate the derivative of y with respect to x, where at x = 1.

2. (2 points) Find the partial derivative of y with respect to x, where y = 3×2 sin(z)e−x Select one:

3. (2 points) For the function f(x) = 5×3 + 2×2 − 3x the value sets the derivative to be 0. Additionally, the second order derivative of f(x) at x = is positive. What can you say about f(x) at the point :

Select one:

a local minimum

a local maximum

a local minimum or a local maximum

None of the above

4. (2 points) Suppose that f(x|θ) = xTθ, where x,θ ∈ Rn. The function g(θ) is defined as g(θ) = (f(x0|θ) − y0)2 for x0 and y0. What is the function type of g(θ):

Select one:

2.4 Vectors and Matrices

1. (1 point) Consider the matrix X and the vectors y and z below: . What is the

inner product of the vectors y and z? (this is also sometimes called the dot product) Select one:

2. (1 point) Using the same values for X, y, and z as above, what is the product of Xy?

Select one:

3. (2 points) Consider u and v . Which of these are valid operations?

Select all that apply:

uTv

vTu uv vv

2 None of the above

4. (2 points) For the matrices A and B What is the product AB?

Select one:

5. (1 point) True or False: does the matrix A from the previous question has an inverse?

Select one:

True

False

6. (2 points) Consider two vectors x and y , let z = xy. What is ?

Select one:

x

y

7. (2 points) Given matrix X X associated with y? 4 2

6 2 and the column vector y

4 4

Select one:

y is not an eigenvector

, what is the eigenvalue of

8. (2 points) Preparing for his linear algebra final, Joe is finding eigenvectors and eigenvalues for different matrices. For one matrix (not given), he finds the following two eigenvectors corresponding to an

3 1

eigenvalue of 4: 117 and 39. Which statement regarding his solution is true?

9 3

Select all that apply:

The solution must be wrong because there cannot be multiple eigenvectors corresponding to a single eigenvalue.

The solution must be wrong because the eigenvectors are linearly dependent.

2 None of the above.

2.5 Geometry

1. (2 points) What relationship does the vector w share with the line wTx+b = 0? (assume x and w are both two dimensional column vectors, and wT indicates the transpose of the column vector w.)

Select one:

parallel

orthogonal

depends on the value of b

2. (1 point) With reference to the above question, select the statement which best explains why w and wTx + b = 0 share the above relationship.

Select one:

The inner product wT(x0 − x00), where x0 and x00 are two points on the line wTx+b = 0, is 0

The inner product wT(x0 − x00), where x0 and x00 are two points on the line wTx+b = 0, is 1

The inner product wT(x0 − x00), where x0 and x00 are two points on the line wTx+b = 0, is b0 − b00

3. (2 points) What is the distance from the origin to the line wTx + b = 0?

(In the following answers, λ is some constant) Select one:

|Tb| w w w

2.6 CS Foundations

1. (1 point) If f(n) = ln(n) and g(n) = log3(n) which of the following are true?

Select one:

Both

Neither

2. (1 point) If f(n) = n10 and g(n) = 10n which of the following are true?

Select one:

Both

Neither

Figure 2.1: Britain’s Royal Family

3. (2 points) Using the tree shown in Figure 2.1, how many nodes would depth-first-search visit in finding Mia Tindall (including her node)? Assuming we search left-to-right and top-down.

Select one:

Figure 2.2: A Binary Tree with indexed nodes

4. (2 points) Figure 2.2 is a Binary Tree with indexed nodes. Assume root node is node 1. What is the node-visit order of DFS and BFS of the above Binary Tree?

A depth-first search (DFS) traversal of a binary tree starts with visiting the root node, and recursively searches down the left subtree (i.e., the tree rooted at the left node) before going to search the right subtree (i.e., the tree rooted at the right node) until the traversal is done.

Note: Alternatively, we can also look right subtree before left subtree too, for the question please consider left to right order!

A breadth-first search (BFS) traversal of a binary tree visits every node (assuming a left-to-right order) on a level (with the same distance to the root) before going to a lower level until the traversal is done.

The node-visit order of DFS is:

The node-visit order of BFS is:

5. (2 points) Fill in the blanks in the pseudo code for key search using recursive depth-first search (DFS) traversal.

class TreeNode:

def __init__(self, val): self.val = val self.leftNode = None self.rightNode = None

# (a) the left/right node is denoted as

# node.leftNode/node.rightNode

# (b) left/right node are of type TreeNode

# (c) the value of the node is denoted as node.val

# (d) recursive DFS to search for the node

# with value key in a binary tree

# (e) the left node is assumed to be searched

# before the right node

def find_val(node, key):

if node is None:

return None

if (1)____________________________:

return node

else:

result = (2)___________________________

if result is None:

result = (3)___________________________

return (4)___________________________

Consider writing a recursive program to solve question 6:

Lucas numbers are defined as:

if n = 0 if n = 1

Ln−1 + Ln−2 if n > 1

6. (2 points) Which of the following is the numerical value for L32?

Select one:

Consider the following information to answer questions 7-8:

Given the functions of computing a fibonacci number:

def fib_1(n): if n == 0 or n == 1: return 1

return fib_1(n – 1) + fib_1(n – 2)

d = {} d[0] = 1 d[1] = 1 def fib_2(n):

if n in d.keys():

return d[n]

d[n] = fib_2(n – 1) + fib_2(n – 2)

return d[n]

7. (2 points) Which of the following is the tightest upper bound on the time complexity of computing fib_1(n)?

Select one:

8. (2 points) Which of the following is the tightest upper bound on the time complexity of computing fib_2(n)?

Select one:

Collaboration Questions Please answer the following:

1. Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.

• If you answered ‘yes’, give full details:

• (e.g. “Jane Doe explained to me what is asked in Question 3.4”)

2. Did you give any help whatsoever to anyone in solving this assignment? Yes / No.

• If you answered ‘yes’, give full details:

• (e.g. “I pointed Joe Smith to section 2.3 since he didn’t know how to proceed with Question 2”)

3. Did you find or come across code that implements any part of this assignment ? Yes / No. (See below policy on “found code”)

• If you answered ‘yes’, give full details:

• (book & page, URL & location within the page, etc.).

## Reviews

There are no reviews yet.