[Readings]
[Decision Tree Induction]
For this assignment, you are going to implement the decision tree induction algorithm described in Chapter 8.2.1 of the textbook.You may use any programming language of your choice, as long as your program can be compiled and run on CS3.
Part I. (15pt) Construct a decision tree using the data in Table 8.1 of the textbook. Note that the final decision tree is shown in Figure 8.2, and the first split based on information gain is discussed in Example 8.1. For this exercise you need to complete the intermediate steps. In particular, at each node, show all possible splits, and for each possible split, calculate gain ratio based on entropy.
The solution for Part I is due on Friday, February 17.
Part II. Algorithm Implementation (85pt)
Your implementation of the decision tree induction algorithm must take a command line parameter as follows (I'm going to use Java for examples, but as stated earlier, you may use other programming languages):
java DecisionTree <inputFile>
inputFile is the name of the input file in the Attribute-Relation File Format (ARFF). You code only needs to handle the nominal attribute type. A couple of sample input files can be found here and here.
Your program should output the constructed decision tree to console. For example, the decision tree shown in Figure 8.2 of the textbook may be printed out like the following:
age = youth
| student = yes: yes
| student = no: no
age = middle_aged: yes
age = senior
| credit_rating = fair: yes
| credit_rating = excellent: no