MIDTERM
CS522, Winter 2012


1. Consider the following transactions

{M, O, N, K, E, Y}
{D, O, N, K, E, Y}
{M, A, K, E}
{M, U, C, K, Y}
{C, O, M, K, I, E}

Let minimum support count min_sup = 3.

(a) (15pt) Draw the FP-tree for this dataset.

(b) (15pt) Mine the FP-tree to find all the frequent itemsets and their support counts.

2. Consider the following dataset

Department Age Group Salary Level Status Count
Sales Young Medium Senior 30
Sales Young Low Junior 80
Systems Young Medium Junior 23
Systems Young High Senior 5
Systems Middle-Aged High Senior 3
Marketing Young Medium Senior 10
Marketing Young Medium Junior 4
Secretary Middle-Aged Medium Senior 4
Secretary Young Low Junior 6

The attributes are Department, Age Group, and Salary Level, and the class label is Status. Count is the number of records in the dataset that have the given attribute values and class label.

(a) (20pt) Construct a Decision Tree from the dataset using Information Gain with Entropy, and use the decision tree to classify the record (Systems,Middle-Aged,Medium,?).

(b) (20pt) Use Naive Bayesian Classification to classify the record (Systems,Middle-Aged,Medium,?).

4. Consider the following dataset

Mileage Engine Air Conditioner Number of Records with Car Value = High Number of Records with Car Value = Low
High Good Working 3 4
High Good Broken 1 2
High Bad Working 1 5
High Bad Broken 0 4
Low Good Working 9 0
Low Good Broken 5 1
Low Bad Working 1 2
Low Bad Broken 0 2

(a) (20pt) Use the given dataset to complete the following BBN. For this exercise you do not need to do "+1 count" - if a probability is 0, just leave it as 0.

[BBN]

(b) (10pt) Use the BBN to compute P(CarValue=High|Mileage=Low, AirConditioner=Broken)