Homework 3
CS522, Winter 2012


Due: Friday, February 17 (Part I) and Friday, February 24 (Part II)

[Readings]

[Decision Tree Induction]

For this assignment, you are going to implement the decision tree induction algorithm described in Chapter 8.2.1 of the textbook.You may use any programming language of your choice, as long as your program can be compiled and run on CS3.

Part I. (15pt) Construct a decision tree using the data in Table 8.1 of the textbook. Note that the final decision tree is shown in Figure 8.2, and the first split based on information gain is discussed in Example 8.1. For this exercise you need to complete the intermediate steps. In particular, at each node, show all possible splits, and for each possible split, calculate gain ratio based on entropy.

The solution for Part I is due on Friday, February 17.

Part II. Algorithm Implementation (85pt)

Your implementation of the decision tree induction algorithm must take a command line parameter as follows (I'm going to use Java for examples, but as stated earlier, you may use other programming languages):

java DecisionTree <inputFile>

inputFile is the name of the input file in the Attribute-Relation File Format (ARFF). You code only needs to handle the nominal attribute type. A couple of sample input files can be found here and here.

Your program should output the constructed decision tree to console. For example, the decision tree shown in Figure 8.2 of the textbook may be printed out like the following:

age = youth
|  student = yes: yes
|  student = no: no
age = middle_aged: yes
age = senior
|  credit_rating = fair: yes
|  credit_rating = excellent: no

Part II is due on Friday, February 24. Please submit your source code, documentation (optional), and a text file hw3.txtwhich contains detailed instructions on how to compile and run your program on CS3. Note that use of existing decision tree induction code found online or from other sources will be considered cheating.