CS522 MIDTERM

MIDTERM
CS522, Winter 2012

1. Consider the following transactions

{M, O, N, K, E, Y}
{D, O, N, K, E, Y}
{M, A, K, E}
{M, U, C, K, Y}
{C, O, M, K, I, E}

Let minimum support count min_sup = 3.

(a) (15pt) Draw the FP-tree for this dataset.

(b) (15pt) Mine the FP-tree to find all the frequent itemsets and their support counts.

2. Consider the following dataset

Department	Age Group	Salary Level	Status	Count
Sales	Young	Medium	Senior	30
Sales	Young	Low	Junior	80
Systems	Young	Medium	Junior	23
Systems	Young	High	Senior	5
Systems	Middle-Aged	High	Senior	3
Marketing	Young	Medium	Senior	10
Marketing	Young	Medium	Junior	4
Secretary	Middle-Aged	Medium	Senior	4
Secretary	Young	Low	Junior	6

The attributes are Department, Age Group, and Salary Level, and the class label is Status. Count is the number of records in the dataset that have the given attribute values and class label.

(a) (20pt) Construct a Decision Tree from the dataset using Information Gain with Entropy, and use the decision tree to classify the record (Systems,Middle-Aged,Medium,?).

(b) (20pt) Use Naive Bayesian Classification to classify the record (Systems,Middle-Aged,Medium,?).

4. Consider the following dataset

Mileage	Engine	Air Conditioner	Number of Records with Car Value = High	Number of Records with Car Value = Low
High	Good	Working	3	4
High	Good	Broken	1	2
High	Bad	Working	1	5
High	Bad	Broken	0	4
Low	Good	Working	9	0
Low	Good	Broken	5	1
Low	Bad	Working	1	2
Low	Bad	Broken	0	2

(a) (20pt) Use the given dataset to complete the following BBN. For this exercise you do not need to do "+1 count" - if a probability is 0, just leave it as 0.

[BBN]

(b) (10pt) Use the BBN to compute P(CarValue=High|Mileage=Low, AirConditioner=Broken)