Homework 3
CS522, Fall 2008


Due: Wednesday, November 19

Please upload your files to CSNS. The files should include all the source code, documentation (optional), and a text file hw3.txt, which contains detailed instructions on how to compile and run your program on the CS3 server. Note that file uploading will be disabled automatically after 11:59PM of the due date, so please turn in your work on time.

[Readings]

[Decision Tree Induction] (90pt)

For this assignment, you are going to implement a decision tree classifier as described in Chapter 6.3 of the textbook. You may use any programming language of your choice, as long as your program can be compiled and run on CS3.

Use the Forest CoverType dataset to test your classifier as follows (I'm going to use Java for examples, but as stated earlier, you may use other programming languages):

java DecisionTreeClassifier <TrainingSet> <TestSet> <ResultSet>

Your classifier should also output to the console the percentage of the correctly classified records, e.g. 50%. Please do not output any debugging information in the submitted code.

I will grade your implementation by randomly selecting 2100 records (300 records from each class) from the dataset as the testing set, and using the remaining records as the training set. Your implementation must achieve at least 30% accuracy rate, and take less than 2 hours to complete both training and classification on CS3. The top five most accurate classifiers will receive up to 20% extra credit.

Note that use of existing classification code found online or from other sources will be considered cheating.