Homework 4
CS522, Winter 2012


Due: Friday, March 9

Please upload your files to CSNS. The files should include all the source code, documentation (optional), and a text file hw4.txt, which contains detailed instructions on how to compile and run your program on the CS3 server. Note that file uploading will be disabled automatically after 11:59PM of the due date, so please turn in your work on time.


1. Decision Tree Classification (30pt)

Write a program to evaluate the Decision Tree classifier you implemented in Homework 3. Your program should take the name of an input file as a command line parameter, e.g.

java DecisionTreeClassify <inputFile>

The input file will be in the ARFF format with only categorical attributes like car.data.arff. The output of your program should be the overall accuracy of the classifier based on 10-fold cross validation.

2. Naive Bayesian Classification (60pt)

Implement a Naive Bayesian classifier and evaluates its accuracy using 10-fold cross validation:

java NaiveBayesianClassify <inputFile>

The input file will be in the ARFF format with only categorical attributes like car.data.arff. The output of the program should be the overall accuracy of the classifier based on 10-fold cross validation.

3. Compare Classifiers (20pt)

Write a program to compare the two classifiers you implemented using 10-fold cross validation and t-test:

java CompareClassifiers <inputFile>

The input file contains the data, and the output of the program should be the t Statistic for the accuracy difference between the two classifiers, and whether the accuracy difference is statistically significant based on t-test with the significance level α=0.05.