Description
1. (35 points) In this problem, we consider Adaboost. Implement the Adaboost algorithm with 100 weak learners and apply it to the cancer dataset described above. For the weak learners, use decision stumps (1-level decision trees). You must implement the decision stumps from scratch. Use information gain as the splitting measure.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along with relevant equations. Report a plot of the classification error on both the train and test sets as the number of weak learners increase. (One plot where the x-axis is the number of weak learners from 1 to 100 and the y-axis is the classification error.)
(b) Code: Submit the file adaboost.py which contains the function def adaboost(dataset: str) -> None:. The function takes in a string of the dataset filename and does not return anything but must print out to the terminal (stdout) the train and test classification error rates as the number of weak learners increase.
2. (35 points) In this problem, we consider Random Forests. Implement the Random Forest algorithm with 100 decision stumps. You must implement the decision stumps from scratch. Use information gain as the splitting measure. Apply your Random Forest implementation to the cancer dataset described above and do the following:
(i) Use m = 3 random attributes to determine the split of your decision stumps. Learn a model for an increasing number of decision stumps in the ensemble. Compute the train and test set classification error as the number of decision stumps increases.
(ii) Vary the number of random attributes m from {2,…,p = 10} and fit a model using 100 decision stumps. Compute the train and test set classification error as the number of random features m increases.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along with relevant equations. Report a plot of the classification error on both the train and test sets for both (i) and (ii) above. (A total of 2 plots where for (i) the x-axis is the number of decision trees and y-axis is the classification error, and (ii) the x-axis is the number of random features m and y-axis is the classification error.)
(b) Code: Submit the file rf.py which contains the function def rf(dataset: str) -> None:. The function takes in a string of the dataset filename and does not return anything but must print out to the terminal (stdout) the train and test classification error rates for (i) and (ii) above.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along with relevant equations. For each value of k = {3,5,7}, report the final (i.e., after kmeans has converged) segmented image and a plot of the cumulative loss during training. (This will be 3 segmented images and 3 plots of the loss where the x-axis is the training iteration number and y-xais is the loss value.)
(b) Code: Submit the file kmeans.py which contains the function def kmeans(image: str) -> None:. The function takes in a string of the image to segment and does not return anything but must print out to the terminal (stdout) the cumulative loss at each iteration during training.
Additional instructions: Code can only be written in Python 3.6+; no other programming languages will be accepted. One should be able to execute all programs from the Python command prompt or terminal. Please specify instructions on how to run your program in the README file.
Each function must take the inputs in the order specified in the problem and display the textual output via the terminal and plots/figures should be included in the report.
Your code must be runnable on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu). One option is to SSH into a machine. Learn about SSH at these links: https://cseit.umn.edu/ knowledge-help/learn-about-ssh, https://cseit.umn.edu/knowledge-help/choose-ssh-tool, and https://cseit.umn.edu/knowledge-help/remote-linux-applications-over-ssh.
Instructions
Follow the rules strictly. If we cannot run your code, you will not get any credit.
• Things to submit
1. hw4.pdf: The report that contains the solutions to Problems 1, 2, and 3 including the summary of methods and results.
2. adaboost.py: Code for Problem 1.
3. rf.py: Code for Problem 2.
4. kmeans.py: Code for Problem 3.
5. README.txt: README file that contains your name, student ID, email, instructions on how to run your code, any assumptions you are making, and any other necessary details.
6. Any other files, except the data, which are necessary for your code.
Homework Policy. (1) You are encouraged to collaborate with your classmates on homework problems, but each person must write up the final solutions individually. You need to list in the README.txt which problems were a collaborative effort and with whom. (2) Regarding online resources, you should not:
• Google around for solutions to homework problems,
• Ask for help on online,
• Look up things/post on sites like Quora, StackExchange, etc.
Reviews
There are no reviews yet.