CSC 148H - Assignment #3 - Trees

The goal of this assignment is to introduce you to several types of trees, comparing and contrasting them with the binary search trees covered in class.

Binary trees can be used to implement maps. Recall that a map (like the built-in Python map) is a collection of (key, value) pairs. Maps are often referred to as dictionaries since we can imagine a map that associates words (its keys) to definitions (its values). Using a binary search tree, we could search for a specified word and retrieve the associated definition. In the particular case where keys are indeed strings, though, searching through a binary search tree is very inefficient compared to the use of other data structures. This is because expensive string comparisons will occur each time we compare the string we are searching for with the current node's key.

A ternary search tree is similar in structure to a binary search tree, except each node has a maximum of three children (instead of a maximum of two children). They are specifically used when keys are strings, and store the string data differently than a binary search tree would. Search tries are yet another alternative tree structure. Please read this Article on Ternary Search Trees and Search Tries to learn all about them.

Part 1 - Ternary Search Trees in Python (30%)

Tree data structures often have rather subtle implementations of their methods. Download tern-template.py; an implementation of ternary search trees in Python. The ternarySearchTree class contains two different implementations of has_key; they are both meant to determine whether a ternary search tree contains a given key. The has_key2 method, however, does not work as expected. Please explain what has_key2 does instead. In other words, based on the properties of the searched-for key or the ternary search tree, when will it return True and when will it return False?

Next, complete the collect_sorted method of tern-template.py. It should create and return an alphabetically sorted list consisting of exactly those strings represented in the ternary search tree. You should collect the strings from the tree in sorted order; you should not try to sort them afterwards as a separate step. You may add helper functions as required. Do not modify any existing methods inside tern-template.py.

For this part of the assignment, you should submit a text file called part1.txt containing your discussion of the operation of has_key2, and a copy of your modified tern-template.py file.

Part 2 - Burst Tries in Python (70%)

A burst trie is a special type of search trie meant to minimize the space consumption of a normal search trie. A burst trie is made up of a normal trie (called an access trie), and a collection of containers. To search through the burst trie, we proceed character-by-character as for a normal trie until we get to a leaf of the access trie, which points to a container (instead of a single key). We then finish the search by searching for the remainder of the characters in the container. The data structure used for the containers should be efficient when used for small sets of strings. To insert into the burst trie, we proceed down the access trie until we reach a container, and add the new key in that container. When a container reaches a certain size, however, we assume that the data structure used for containers will no longer be efficient on its set of strings. In that case, we burst the container, replacing it with a new trie node and a set of children which partition the keys that were contained in the (now burst) container.

For this part of the assignment, you are to implement a burst trie class in Python. Your class should support two methods:

Download burst-template.py, which contains the method signatures for the constructor, put and get (you are not to change these signatures). You may add other supporting functions as required.

This Article on Burst Tries (Alternate Link) describes the data structure in more detail. For this assignment:

For this part of the assignment, submit only your modified burst-template.py file.

Further Guidelines

Marking

Part 1: (30%)
    Quality/completeness of has_key2 discussion (20%)
    Correctness of Modifications to tern-template.py (80%)
Part 2: (70%)
    Correctness (100%)