You will also be given all the B-tree page level classes: including HFPage, SortedPage, BTreeLeafPage and BTreeIndexPage. SortedPage is derived from HFPage, and it augments the insertRecord method of HFPage by storing records on the HFPage in sorted order by a specified key value. The key value must be included as the initial part of each inserted record, to enable easy comparison of the key value of a new record with the key values of existing records on a page. The documentation available in the header files is sufficient to understand what operation each function performs. Both of BTIndexPage and BTLeafPage are derived from SortedPage. These page classes are used to build the B+ tree index. You will be given the code to create, destroy, open and close a B+ tree index.
You will also write code that will open a scan on the B+ tree, allowing its caller to iterate through all of the data entries (from the leaf pages) that satisfy some search criterion.
[Files]You should note that key values are passed to functions using void * pointers (pointing to the key values). The contents of a key should be interpreted using the AttrType variable. The key can be either a string(attrString) or an integer(attrInteger), as per the definition of AttrType in minirel.h. We just implement these two kinds of keys in this assignment. If the key is a string, it has a fixed maximum length, MAX_KEY_SIZE1, defined in bt.h.
Although the specifications for some methods (e.g., the constructor of BTreeFile) suggest that keys can be of (the more general enumerated) type AttrType, you can return an error message if the keys are not of type attrString or attrInteger.
The SortedPage class, which augments
the insertRecord method of HFPage by storing records on a page in sorted
order according to a specified key value, assumes that the key value is
included as the initial part of each record, to enable easy comparison
of the key value of a new record with the key values of existing
records on a page.
[B+ Tree Page-Level Classes]
There are four separate pages classes, of which you will be given
the source code. HFPage is the base class
(given), and from it is derived SortedPage.
BTIndexPage and BTLeafPage
are derived from SortedPage.
For further details about the individual methods in these classes, look at the header pages for the class.
Lastly, you will be given a structure to represent the header page
of the B+ tree. Despite its name, the data structure used to represent
the header page need not be derived from a Page object. It can be
implemented simply as a C++ struct, with a
field for each piece of information that must be stored in the header
page. Just remember to cast pointers to this struct
as (Page *) pointers when making calls to functions such as pinPage().
[Other B+ Tree Classes]
We will assume here that everyone understands the concept of B+ trees, and the basic algorithms, and concentrate on explaining the design of the C++ classes that you will implement.
A BTreeFile will contain a header page and a number of BTIndexPages and BTLeafPages. The header page is used to hold information about the tree as a whole, such as the page id of the root page, the type of the search key, the length of the key field(s) (which has a fixed maximum size in this assignment), etc. When a B+ tree index is opened, you should read the header page first, and keep it pinned until the file is closed. Given the name of the B+ tree index file, how can you locate the header page? The DB class has a method
Status add_file_entry(const char* fname, PageId header_page_num);
that lets you register this information
when a file fname is created. There
are similar methods for deleting and reading these `file entries'
([file name, header page] pairs) as well, which can be used when the
file is destroyed or opened. The header page contains the page id of
the root of the tree, and every other page in the tree is accessed
through the root page.
Figure 1 shows what a BTreeFile with
only one BTLeafPage looks like; the single
leaf page is also the root. Note that there is no BTIndexPage
in this case.
Figure 2 shows a tree with a few BTLeafPages,
and this can easily be extended to contain multiple levels of BTIndexPages as well.
IndexFile and IndexFileScan
A BTree is one particular type of index. There are other types, for example a Hash index. However, all index types have some basic functionality in common. We've taken this basic index functionality and created a virtual base class called IndexFile. You won't write any code for IndexFile. However, any class derived from an IndexFile should support IndexFile(), Delete(), and insert(). (IndexFile and IndexFileScan are defined in /include/index.h).
Likewise, an IndexFileScan is a virtual
base class that contains the basic functionality all index file scans
should support.
BTreeFile
The main class to be implemented for this assignment is BTreeFile. BTreeFile is a derived class of the IndexFile class, which means a BTreeFile is a kind of IndexFile. However, since IndexFile is a virtual base class all of the methods associated with IndexFile must be implemented for BTreeFile. You should have copied btfile.h into your directory, as per the instructions in Section 2.
The method you are going to implement is:
If a page overflows (i.e., no space for the new entry), you should split the page. You may have to insert additional entries of the form [key, id of child page] into the higher level index pages as part of a split. Note that this could recursively go all the way up to the root, possibly resulting in a split of the root node of the B+ tree.
You will be given the code for:
BTreeFileScan
Finally, you will implement scans that return data entries from the leaf pages of the tree. You will create the scan through a member function of BTreeFile (BtreeFile::new_scan(...) as defined in btfile.h). The parameters passed to new_scan() specify the range of keys that will be scanned in the B+ tree. They are explained in detail in btfile.h.
Errors
In the Buffer Manager assignment, you learned how to use the Minibase error protocol. Reviewing it now would
be a good idea. In that assignment, all the errors you returned
belonged to one of the categories in new error.h,
namely BUFMGR. In this assignment, you will need to use BTREE,
BTLEAFPAGE, and BTINDEXPAGE.