Unit 3: Trees
1. Introduction to Tree as a Data Structure
A Tree is a non-linear, hierarchical data structure that consists of nodes connected by edges. It is one of the most important and widely used data structures.
Unlike linked lists, where each node points to one successor, a tree node can point to multiple successors (children).
Key Terminology
- Node: The basic unit containing data and pointers to other nodes.
- Root: The topmost node in the tree. It is the only node with no parent.
- Edge: The link connecting a parent node to a child node.
- Parent: A node that has a child (or children).
- Child: A node that is pointed to by a parent.
- Leaf (or External Node): A node with no children.
- Internal Node: A node that has at least one child.
- Siblings: Nodes that share the same parent.
- Subtree: A tree consisting of a node and all of its descendants.
- Height of a Node: The length of the longest path from the node to a leaf. The height of a leaf is 0.
- Height of a Tree: The height of the root node.
- Depth of a Node: The length of the path from the root to the node. The depth of the root is 0.
- Degree of a Node: The number of children a node has.
- Degree of a Tree: The maximum degree of any node in the tree.
A Binary Tree is a specific type of tree in which every node can have at most two children: a left child and a right child.
Types of Binary Trees
- Full Binary Tree: A binary tree where every node has either 0 or 2 children.
- Complete Binary Tree: A binary tree where all levels are completely filled *except* possibly the last level, and all nodes in the last level are as far left as possible. (This is important for Heaps).
- Perfect Binary Tree: A binary tree where all internal nodes have 2 children and all leaves are at the same level.
- Skewed Binary Tree: A tree where each node has only one child (either all left or all right). It behaves just like a linked list.
2. Binary Search Trees (BST)
A Binary Search Tree is a binary tree with a special property that makes searching very efficient.
Properties
For any node N in a BST:
- All values in
N's left subtree are less than N's value.
- All values in
N's right subtree are greater than N's value.
- Both the left and right subtrees must also be binary search trees.
This property allows for fast searching, insertion, and deletion, with an average-case time complexity of O(log n). The worst case (for a skewed tree) is O(n).
Insertion in BST
To insert a new value:
- Start at the root.
- Compare the new value with the root's value.
- If the new value is less, move to the left child.
- If the new value is greater, move to the right child.
- Repeat this process until you reach a
NULL pointer (an empty spot).
- Insert the new node at that empty spot.
Deletion in BST
This is the most complex operation. There are three cases:
- Case 1: Node to be deleted is a Leaf (0 children)
- Simply remove the node (set its parent's pointer to
NULL).
- Case 2: Node to be deleted has 1 child
- "Bypass" the node: Link the node's parent directly to the node's child.
- Case 3: Node to be deleted has 2 children
- Find the node's "successor." This can be either:
- The Inorder Successor: The smallest node in its right subtree (go right once, then all the way left).
- The Inorder Predecessor: The largest node in its left subtree (go left once, then all the way right).
- Copy the value of the successor (or predecessor) into the node to be deleted.
- Delete the successor (or predecessor) node (which is an easier Case 1 or Case 2 deletion).
Recursive and Iterative Traversals
Traversal means visiting every node in the tree exactly once. The three main types are (L=Left, R=Right, P=Process/Visit Node):
1. Inorder Traversal (L-P-R)
- Recursive:
- Iterative (using a Stack):
- Create an empty stack.
- Start with the
root node.
- Loop:
- While the current node is not
NULL, push it onto the stack and move to its left child.
- If the current node is
NULL (and stack is not empty): Pop a node, visit it, and move to its right child.
- Repeat until both the stack is empty and the current node is
NULL.
- Utility: For a BST, an inorder traversal visits the nodes in sorted, ascending order.
2. Preorder Traversal (P-L-R)
- Recursive:
- Iterative (using a Stack):
- Push the
root onto the stack.
- Loop while stack is not empty:
- Pop a node, visit it.
- Push its right child (if it exists) onto the stack.
- Push its left child (if it exists) onto the stack.
- (Note: Right is pushed *before* left to ensure left is processed first, due to LIFO).
- Utility: Used to create a copy of the tree.
3. Postorder Traversal (L-R-P)
- Recursive:
- Iterative (using two Stacks): This is the most complex iterative traversal. A simpler (but modified) approach uses one stack.
- Utility: Used to delete a tree. You must delete children before you can delete the parent.
Exam Tip: Given a preorder and inorder traversal, you can uniquely reconstruct a binary tree. This is a very common exam question.
3. Threaded Binary Trees
In a standard binary tree, there are many NULL pointers (especially at leaf nodes). A threaded binary tree re-uses these NULL pointers to "thread" the tree, making traversal more efficient.
A NULL right child pointer is replaced with a pointer (a "thread") to the node's Inorder Successor.
A NULL left child pointer is replaced with a pointer to the node's Inorder Predecessor.
Two flag bits are needed in each node to indicate if the left/right pointers are normal child links or threads.
Insertion, Deletion, and Traversals
- Advantage: Inorder traversal can be performed without a stack or recursion. You just follow the
right pointers (which are either children or threads).
- Disadvantage: Insertion and deletion are much more complex because you must maintain the threads, which may involve updating nodes far away from the inserted/deleted node.
4. Height-Balanced Trees (AVL Trees)
A standard BST has a worst-case time complexity of O(n) if it becomes "skewed" (like a linked list). An AVL Tree (named after its inventors, Adelson-Velsky and Landis) is a self-balancing binary search tree.
Balance Factor
An AVL tree ensures that the heights of the two child subtrees of any node differ by at most one.
Balance Factor (BF) = Height(Left Subtree) - Height(Right Subtree)
For a tree to be an AVL tree, every single node must have a Balance Factor of -1, 0, or 1.
If an insertion or deletion causes any node's BF to become -2 or +2, the tree is "unbalanced" and must be "rebalanced."
Rotations (Operations)
Rebalancing is done using rotations. A rotation is a local operation that changes the structure of the tree to restore balance, while *preserving* the BST property.
There are four imbalance cases, which require two types of rotations:
1. Left-Left (LL) Case (BF = +2)
- Problem: The new node was inserted in the left subtree of the left child of the unbalanced node (Z).
- Solution: Perform a Right Rotation on Z.
2. Right-Right (RR) Case (BF = -2)
- Problem: The new node was inserted in the right subtree of the right child of Z.
- Solution: Perform a Left Rotation on Z.
3. Left-Right (LR) Case (BF = +2)
- Problem: The new node was inserted in the right subtree of the left child of Z.
- Solution: Perform a Left Rotation on the *child* (Y), followed by a Right Rotation on the *parent* (Z). This is a double rotation.
4. Right-Left (RL) Case (BF = -2)
- Problem: The new node was inserted in the left subtree of the right child of Z.
- Solution: Perform a Right Rotation on the *child* (Y), followed by a Left Rotation on the *parent* (Z).
By performing these rotations, the AVL tree guarantees that its height remains O(log n), so all operations (search, insert, delete) are O(log n) in the worst case.
5. Heap Tree
A Heap is a specialized tree-based data structure. It is a complete binary tree that satisfies the heap property.
A heap is NOT a Binary Search Tree. There is no left/right ordering property.
Heap Property:
- Max-Heap: The value of any parent node is greater than or equal to the values of its children. The root node is the maximum element.
- Min-Heap: The value of any parent node is less than or equal to the values of its children. The root node is the minimum element.
Representation: Because heaps are complete binary trees, they are almost always stored in an array, not with pointers.
For a node at index i:
- Its left child is at index
2*i + 1.
- Its right child is at index
2*i + 2.
- Its parent is at index
floor((i - 1) / 2).
Operations
Operations involve adding/removing elements and then "heapifying" (fixing the heap property).
1. Insertion (O(log n))
- Add the new element to the *end* of the array (the next open spot in the complete tree).
- "Bubble Up" (or Heapify-Up): Compare the new element with its parent.
- If it violates the heap property (e.g., in a Max-Heap, it's larger than its parent), swap them.
- Repeat this process (comparing and swapping with the parent) until the heap property is restored or the node reaches the root.
2. Deletion (Extract-Max / Extract-Min) (O(log n))
- The element to be removed is always the root (index 0).
- Store the root's value (to return it).
- Move the last element in the array to the root (index 0).
- "Bubble Down" (or Heapify-Down):
- Compare the new root with its children.
- If it violates the heap property, swap it with its larger child (for Max-Heap) or smaller child (for Min-Heap).
- Repeat this process (comparing and swapping down the tree) until the heap property is restored or the node becomes a leaf.
Applications: Heaps are the default implementation for Priority Queues (Unit 2). They are also used in the HeapSort algorithm (Unit 4).