Description
1. For the following dataset:
ID Outl Temp Humi Wind PLAY
TRAINING INSTANCES
A s h h F N
B s h h T N
C o h h F Y
D r m h F Y
E r c n F Y
F r c n T N
TEST INSTANCES
G o c n T ?
H s m h F ?
(i). Classify the test instances using the method of 0-R.
(ii). Classify the test instances using the method of 1-R.
(iii). Classify the test instances using the ID3 Decision Tree method:
a) Using the Information Gain as a splitting criterion
b) Using the Gain Ratio as a splitting criterion
The dataset includes the list of books available in the library (columns) and the students who borrowed them (rows), and the ranking for each item (ranking value is between 0–5, 0 if the book was not borrowed and 1–5 indicates the student’s interest). The metadata for the books (e.g., titles) are not readily available to us, we just have the book IDs (e.g., Book #i). The dataset also includes the students’ field of study (in total there are 10 fields), which can be used for the classification task. Answer the following questions, considering that there are 500,000 students and 100,000 books in this dataset.
(i). Consider the following supervised machine learning methods, and for each one, explain why it would be appropriate or inappropriate to use for this problem:
i. Naïve Bayes
ii. k-NN
iii. Decision Tree
1
(ii). Would “feature selection” be useful here? Explain why, by referring to a single machine learning method.
(iii). Explain how you would evaluate the effectiveness of your system: you should briefly describe an evaluation strategy and an evaluation metric that are suitable for this data. What might be an example of a baseline?
2
Reviews
There are no reviews yet.