Description
1. How is holdout evaluation different to cross-validation evaluation? What are some reasons we would prefer one strategy over the other?
2. A confusion matrix is a summary of the performance of a (supervised) classifier over a set of development (“test”) data, by counting the various instances:
(i). Calculate the classification accuracy of the system. Find the error rate for the system.
(ii). Calculate the precision, recall and F-score (where β = 1) for class d.
(iii). Why can’t we do this for the whole system? How can we consider the whole system?
3. For the following dataset:
ID Outl Temp Humi Wind PLAY
TRAINING INSTANCES
A s h h F N
B s h h T N
C o h h F Y
D r m h F Y
E r c n F Y
F r c n T N
TEST INSTANCES
G o c n T ?
H s m h F ?
(i). Classify the test instances using the method of 0-R.
(ii). Classify the test instances using the method of 1-R. (for H assume Outl = s)
4. Given the above dataset, we wished to perform feature selection on this dataset, where the class is
PLAY:
(i). Which of Humi and Wind has the greatest Pointwise Mutual Information for the class Y? What about N?
(ii). Which of the attributes has the greatest Mutual Information for the class, as a whole?
Reviews
There are no reviews yet.