Homework 1
CS522, Winter 2012


Due: Friday, January 27

Please upload your solutions to CSNS. Your file(s) must be in text, MS Word, OpenDocument Text (ODT), or PDF format. Note that file uploading will be disabled automatically after 11:59PM of the due date, so please turn in your work on time.

[Readings]

[Exercises]

1. (25pt) Design a star schema for the company FIT-WORLD GYM INC.

(a) Draw the star schema diagram.

(b) Show the tables populated with the given data.

2. (15pt) Consider a data cube with n dimensions and the cardinality of each dimension is m.

(a) How many cuboids are there in this data cube?

(b) How many cells are there in a k-dimensional cuboid?

(c) How many aggregate cells are there in the whole data cube?

(d) Suppose n=10 and m=20, and each cell requires 4 bytes to store. How many 1TB hard disks are needed to store all the aggregate cells in the data cube?

3. (15pt)  Consider a 10-dimensional data cube with the measure COUNT. The base cuboid of this cube contains only three non-zero cells:

List all the closed cells in this data cube.

4. (15pt) Consider a 4-dimensional data cube with dimensions A, B, C, D, and the cardinality of each dimension is 10000, 1000, 100, 10, respectively. Suppose we want to use Multiway Array Aggregation to compute the 3-D cuboids from the base cuboid. The base cuboid is divided into chunks by partitioning each dimension into 10 equal portions. What is the order to read the chunks into memory that minimizes the memory requirement? In particular, how many 3-D cells need to be kept in memory when the chunks are read in this order?

5. (15pt) Consider the following dataset with three dimensions A={a1,a2}, B={b1,b2,b3}, C={c1,c2,c3,c4} and the measure SUM:

(a1,b1,c1,2)
(a1,b1,c3,3)
(a2,b2,c2,2)
(a2,b3,c2,4)
(a2,b1,c3,3)
(a2,b2,c3,2)

List all the cells in the iceberg cube with SUM > 5 in the order produced by the BUC algorithm optimized with Apriori pruning and dimension ordering.

6. (15pt) Consider the the following dataset with five dimensions A, B, C, D, E and the measure SUM:

T1: (a1,b1,c1,d1,e1,50)
T2: (a2,b1,c2,d1,e2,10)
T3: (a1,b2,c1,d2,e2,40)
T4: (a2,b1,c1,d2,e2,20)
T5: (a2,b2,c2,d1,e3,70)

(a) Construct shell fragments (A,B) and (C,D,E).

(b) Use the shell fragments you constructed in (a) to answer the query (*,b1,*,d1,*,?).