Binary Trees
Arrays give fast random access but slow insertion. Linked lists give fast insertion but slow access. The goal all along has been $O(\log n)$ for everything. Binary trees are the structure that makes this possible — if we keep them balanced.
The Gap We Are Trying to Close
Looking back at what we have built: arrays give $O(1)$ random access but $O(n)$ insertion. Linked lists give $O(1)$ front insertion but $O(n)$ access. Hash tables give $O(1)$ expected find but $O(n)$ for any order operation. Sorted arrays give $O(\log n)$ find but $O(n)$ insertion.
The unmet goal for both the Sequence and Set interfaces is $O(\log n)$ for every operation. Binary trees are the data structure built for exactly this. The key insight: if we can keep the tree's height $h = O(\log n)$, and every operation costs $O(h)$, then every operation costs $O(\log n)$.
The Node — Four Pointers
node.item — the data stored at this nodenode.parent — pointer to the parent node (None for the root)node.left — pointer to the left child (None if absent)node.right — pointer to the right child (None if absent)
A binary tree is a set of nodes connected by these pointers, with one designated root (the node with no parent). A node with no children is a leaf.
class Binary_Node:
def __init__(self, x): # O(1)
self.item = x
self.left = None # left child
self.right = None # right child
self.parent = None # parent node
# ── TRAVERSAL ─────────────────────────────────────────
def subtree_iter(self): # O(n) — in-order traversal
"""Yield all nodes in this subtree in traversal order."""
if self.left:
yield from self.left.subtree_iter()
yield self
if self.right:
yield from self.right.subtree_iter()
# ── NAVIGATION ────────────────────────────────────────
def subtree_first(self): # O(h) — walk left until leaf
"""Return the first node in this subtree's traversal order."""
if self.left:
return self.left.subtree_first()
return self
def subtree_last(self): # O(h) — walk right until leaf
if self.right:
return self.right.subtree_last()
return self
def successor(self): # O(h) — next in traversal order
"""Return the node immediately after self in traversal order."""
if self.right:
return self.right.subtree_first() # case 1: has right child
# case 2: walk up until we come from a left subtree
node = self
while node.parent and (node is node.parent.right):
node = node.parent
return node.parent # None if self is last
def predecessor(self): # O(h) — previous in traversal order
if self.left:
return self.left.subtree_last()
node = self
while node.parent and (node is node.parent.left):
node = node.parent
return node.parent
#include <functional>
using namespace std;
template<typename T>
struct Binary_Node {
T item;
Binary_Node* left = nullptr;
Binary_Node* right = nullptr;
Binary_Node* parent = nullptr;
Binary_Node(T x) : item(x) {}
// In-order traversal — O(n)
void subtree_iter(function<void(Binary_Node*)> visit) {
if (left) left->subtree_iter(visit);
visit(this);
if (right) right->subtree_iter(visit);
}
// Walk left until leaf — O(h)
Binary_Node* subtree_first() {
return left ? left->subtree_first() : this;
}
Binary_Node* subtree_last() {
return right ? right->subtree_last() : this;
}
// Next in traversal order — O(h)
Binary_Node* successor() {
if (right) return right->subtree_first();
auto* node = this;
while (node->parent && node == node->parent->right)
node = node->parent;
return node->parent; // nullptr if this is last
}
Binary_Node* predecessor() {
if (left) return left->subtree_last();
auto* node = this;
while (node->parent && node == node->parent->left)
node = node->parent;
return node->parent;
}
};
Terminology — Depth, Height, and Why They Matter
Depth of $X$ — length of the path from $X$ to the root $R$. The root has depth 0.
Height of $X$ — maximum depth of any node in the subtree rooted at $X$. A leaf has height 0.
Height of the tree — height of the root = maximum depth of any node.
Every operation we will define on binary trees runs in $O(h)$ time, where $h$ is the height. A tree with $n$ nodes can have height anywhere from $\lfloor \log_2 n \rfloor$ (perfectly balanced) to $n-1$ (a chain — effectively a linked list). The entire enterprise of Chapters 6 and 7 is ensuring $h = O(\log n)$.
/ \
B C ← depth 1; C is a leaf (height 0)
/ \
D E ← depth 2
/
F ← leaf, depth 3, height 0
Traversal Order — The Central Concept
Every binary tree has a natural traversal order defined recursively: every node in a node's left subtree comes before it, and every node in the right subtree comes after it. The subtree_iter function above implements this — it is precisely in-order traversal.
Right now, this order has no semantic meaning. We will assign two different meanings to it:
Sequence interpretation: traversal order = sequence order. The $i$-th node in traversal order is the $i$-th item in the sequence. This gives us $O(h)$ get_at(i) and insert_at(i, x).
Set interpretation: traversal order = sorted order by key. This is the Binary Search Tree (BST) property. It gives us $O(h)$ find(k), find_next(k), find_prev(k).
Both interpretations share the same underlying node structure. Only the semantic meaning changes.
Tree Navigation — Four Operations, All O(h)
Four navigation operations form the backbone of everything we build on top of binary trees. Each runs in $O(h)$ by walking up or down the tree:
subtree_first(X) — walk left until you hit a node with no left child. That is the first node in $X$'s subtree. To find the minimum key in a BST: call subtree_first(root).
subtree_last(X) — symmetric. Walk right to find the last node.
successor(X) — two cases. If $X$ has a right child, the successor is the first node in the right subtree. If not, walk up until you come from a left child — that ancestor is the successor.
predecessor(X) — symmetric to successor.
Dynamic Operations — Inserting and Deleting Nodes
The tree only changes by adding or removing leaves — never internal nodes directly. This keeps the logic manageable. All dynamic operations run in $O(h)$.
Insert node $Y$ after node $X$ in traversal order: If $X$ has no right child, attach $Y$ as the right child. Otherwise, $X$'s successor has no left child (by definition of successor) — attach $Y$ as its left child.
Delete node $X$: If $X$ is a leaf, detach from parent. If not, swap $X$'s item with its predecessor (or successor if no predecessor), then recursively delete the node now holding $X$'s original item — which is one step closer to being a leaf. This recursion terminates because we always move down.
def subtree_insert_before(self, B): # O(h) — insert B before self
if self.left:
# self has a left child → insert B as rightmost node
# of the left subtree (which has no right child)
node = self.left.subtree_last()
node.right = B
B.parent = node
else:
# self has no left child → B becomes left child
self.left = B
B.parent = self
def subtree_insert_after(self, B): # O(h) — insert B after self
if self.right:
node = self.right.subtree_first()
node.left = B
B.parent = node
else:
self.right = B
B.parent = self
def subtree_delete(self): # O(h)
"""Remove self from tree. Returns the node physically deleted (a leaf)."""
if self.left or self.right: # not a leaf — swap down first
# choose predecessor if left child exists, else successor
B = self.predecessor() if self.left else self.successor()
self.item, B.item = B.item, self.item # swap items
return B.subtree_delete() # now recurse on B
# self is a leaf — detach from parent
if self.parent:
if self.parent.left is self:
self.parent.left = None
else:
self.parent.right = None
return self
void insert_before(Binary_Node* B) { // O(h)
if (left) {
auto* node = left->subtree_last();
node->right = B;
B->parent = node;
} else {
left = B;
B->parent = this;
}
}
void insert_after(Binary_Node* B) { // O(h)
if (right) {
auto* node = right->subtree_first();
node->left = B;
B->parent = node;
} else {
right = B;
B->parent = this;
}
}
Binary_Node* subtree_delete() { // O(h)
if (left || right) {
auto* B = left ? predecessor() : successor();
swap(item, B->item); // swap items, not pointers
return B->subtree_delete(); // recurse — always moves down
}
// self is a leaf
if (parent) {
if (parent->left == this) parent->left = nullptr;
else parent->right = nullptr;
}
return this;
}
Application: Binary Search Tree (Set Interface)
Assign the Set semantic to traversal order: the traversal order is sorted by key. This is the BST property: for every node, every key in the left subtree is smaller, every key in the right subtree is larger.
Finding key $k$ in a BST is like binary search, but on the tree structure instead of an array. At each node: if $k$ is smaller, recurse left; if larger, recurse right; if equal, found. Cost: $O(h)$.
class BST_Node(Binary_Node):
"""A Binary_Node where traversal order = sorted order by key."""
def subtree_find(self, k): # O(h)
"""Find node with key k in this subtree. Returns None if absent."""
if k < self.item.key:
return self.left.subtree_find(k) if self.left else None
elif k > self.item.key:
return self.right.subtree_find(k) if self.right else None
else:
return self # found
def subtree_find_next(self, k): # O(h) — smallest key > k
if self.item.key <= k:
return self.right.subtree_find_next(k) if self.right else None
# self.item.key > k: check if left subtree has something better
B = self.left.subtree_find_next(k) if self.left else None
return B if B else self
def subtree_find_prev(self, k): # O(h) — largest key < k
if self.item.key >= k:
return self.left.subtree_find_prev(k) if self.left else None
B = self.right.subtree_find_prev(k) if self.right else None
return B if B else self
def subtree_insert(self, B): # O(h) — insert BST_Node B
"""Insert B maintaining BST property."""
if B.item.key < self.item.key:
if self.left: self.left.subtree_insert(B)
else: self.subtree_insert_before(B)
elif B.item.key > self.item.key:
if self.right: self.right.subtree_insert(B)
else: self.subtree_insert_after(B)
else:
self.item = B.item # duplicate key: update in place
template<typename T, typename K>
struct BST_Node : Binary_Node<T> {
using Base = Binary_Node<T>;
BST_Node(T x) : Base(x) {}
// Find node with key k — O(h)
BST_Node* subtree_find(K k) {
if (k < this->item.key)
return this->left ? static_cast<BST_Node*>(this->left)->subtree_find(k) : nullptr;
if (k > this->item.key)
return this->right ? static_cast<BST_Node*>(this->right)->subtree_find(k) : nullptr;
return this;
}
// Insert BST_Node B maintaining sorted order — O(h)
void subtree_insert(BST_Node* B) {
if (B->item.key < this->item.key) {
if (this->left) static_cast<BST_Node*>(this->left)->subtree_insert(B);
else this->insert_before(B);
} else if (B->item.key > this->item.key) {
if (this->right) static_cast<BST_Node*>(this->right)->subtree_insert(B);
else this->insert_after(B);
} else {
this->item = B->item; // update duplicate
}
}
};
Application: Sequence Binary Tree (Augmentation)
For the Sequence interface, traversal order = sequence order. The key operation is get_at(i) — find the $i$-th node in traversal order. Naively, iterate through the tree: $O(n)$. Better: augment each node with the size of its subtree.
With subtree sizes, get_at(i) becomes $O(h)$: check the left subtree size $n_L$. If $i < n_L$, recurse left. If $i > n_L$, recurse right with $i' = i - n_L - 1$. If $i = n_L$, return this node.
Maintaining subtree sizes costs $O(h)$ per insert or delete — just update all ancestors on the path to the root. This is the general principle of augmentation: add a field to each node, maintain it during modifications, and exploit it to answer queries faster.
Build a BST from NIFTY strike prices $[24000, 23500, 24500, 23000, 24200]$ inserted in that order. Then: (a) find the successor of 23500; (b) find the node with key 24200; (c) find the smallest key above 24100.
Insert 24000 → root. Insert 23500 → left of 24000 (23500 < 24000). Insert 24500 → right of 24000. Insert 23000 → left of 23500. Insert 24200 → left of 24500 (24200 < 24500).
/ \
23500 24500
/ /
23000 24200
23500 has no right child. Walk up: 23500 is a left child of 24000. Return 24000. Answer: 24000.
At 24000: 24200 > 24000, go right → at 24500: 24200 < 24500, go left → at 24200: found. Cost: $O(h) = O(3)$ steps. Answer: node at 24200.
At 24000: 24000 ≤ 24100, go right → at 24500: 24500 > 24100, check left → at 24200: 24200 > 24100, check left → None. Return 24200 (better than 24500). Answer: 24200.
Why $O(h)$ Is Not Yet $O(\log n)$
Everything above runs in $O(h)$. But we have not guaranteed $h = O(\log n)$. A BST built by inserting already-sorted keys degenerates into a linked list: height $n-1$, all operations $O(n)$.
\
23500
\
24000 ← height = n-1 = 4, all ops O(n)
\
24200
\
24500
This is the fundamental problem that Chapter 7 solves. AVL trees maintain a balance invariant — the heights of any node's left and right subtrees differ by at most 1. This guarantees $h = O(\log n)$ always, even after arbitrary insertions and deletions.
The Performance Picture So Far
| Data Structure | build | find(k) | insert/delete | find_min/max | find_prev/next |
|---|---|---|---|---|---|
| Sorted Array | $\Theta(n \log n)$ | $\Theta(\log n)$ | $\Theta(n)$ | $\Theta(1)$ | $\Theta(\log n)$ |
| Hash Table | $\Theta(n)^{(e)}$ | $\Theta(1)^{(e)}$ | $\Theta(1)^{(e)}$ | $\Theta(n)$ | $\Theta(n)$ |
| Binary Tree (BST) | $\Theta(nh)$ | $\Theta(h)$ | $\Theta(h)$ | $\Theta(h)$ | $\Theta(h)$ |
| Balanced BST (goal) | $\Theta(n \log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
(e) expected · (h) height of tree, can be O(n) without balance guarantee
sortedcontainers library (not in stdlib, installable via pip) provides SortedList and SortedDict with $O(\log n)$ operations — implemented using a B-tree variant. For trading applications requiring fast in-order operations (find the next strike, find the nearest expiry, iterate strikes in order), SortedList is the right tool. Understanding the binary tree structure underneath is what lets you reason about when to use it, when to use a dict instead, and what the actual cost of each operation is.
In a BST, what is the traversal order of the tree if we insert keys $[5, 3, 7, 1, 4]$ in that order? And what is the traversal order output of subtree_iter on the root?
The BST property ensures that traversal order (left subtree, self, right subtree) always produces keys in sorted ascending order. Regardless of insertion order, a correct BST's
subtree_iter always yields keys sorted — this is the defining property.The resulting tree: 5 is root, 3 is left child of 5, 7 is right child, 1 is left child of 3, 4 is right child of 3. Traversal: left(5)'s subtree = [1,3,4], then 5, then right(5)'s subtree = [7]. Output: $[1,3,4,5,7]$.
In the BST built from $[24000, 23500, 24500, 23000, 24200]$ (from the worked example), what is the predecessor of node 24000?
The predecessor of a node $X$ is the largest key smaller than $X$'s key — equivalently, the last node in the left subtree's traversal order. Since 24000 has a left child (23500), we call
subtree_last on the left subtree rooted at 23500. Walk right from 23500 — it has no right child — so return 23500 itself.This makes intuitive sense: in sorted order $[23000, 23500, 24000, 24200, 24500]$, the element immediately before 24000 is 23500. The BST predecessor operation finds this in $O(h)$ time.
You build a BST by inserting NIFTY expiry dates as integer keys, always in chronological order (smallest first). What is the height of the resulting BST, and what does this mean for operations?
When keys are inserted in sorted order, each new key is larger than all existing keys, so it always goes to the rightmost position. The tree grows as a chain: $k_1 \to k_2 \to k_3 \to \cdots$, each node the right child of the previous. Height $= n - 1$.
All operations become $\Theta(n)$ — as slow as a linked list. This is why naively building a BST from sorted data is dangerous. The fix (Chapter 7): use an AVL tree, which maintains height $O(\log n)$ through rotations after every insertion. Alternatively: shuffle the keys before insertion to randomise the BST's shape (expected height $O(\log n)$).
A sequence binary tree is augmented with subtree sizes. Node A has left.size = 3, right.size = 2, and stores one item itself. A call to subtree_at(4) (find the 4th item, 0-indexed) is made at A. What happens?
The subtree at A has $n_L = 3$ (left), 1 (self), $n_R = 2$ (right) = 6 total nodes. We want index 4 (0-based):
— Index 4 > $n_L = 3$: the 4th item is not in the left subtree.
— Index 4 = $n_L$? No: 4 ≠ 3, so it is not A itself.
— Index 4 > $n_L$: recurse into the right subtree with adjusted index $i' = i - n_L - 1 = 4 - 3 - 1 = 0$.
So we look for the 0th item in the right subtree — its first/leftmost node. The total tree has 6 nodes (index 0 through 5). In sorted order: indices 0,1,2 are in the left subtree; index 3 is A; indices 4,5 are in the right subtree. Index 4 → right subtree index 0. Correct.
Work through independently. Q1–3 are direct applications. Q4–7 require combining ideas. Q8–10 demand careful argument.
successor(7); (b) find predecessor(12); (c) find subtree_find_next(6) — the smallest key greater than 6. Trace each operation step by step.
Easy
subtree_iter runs in $\Theta(n)$ time by showing that each edge of the tree is traversed exactly twice (once going down, once coming back up). Use this to argue that iterating a binary tree using subtree_first + repeated successor calls also runs in $\Theta(n)$ total.
Medium
subtree_find, subtree_find_next, and subtree_insert to handle composite keys. What comparison operation does each node perform?
Medium
subtree_sum field at each node — the sum of all keys in the subtree. Show how to maintain this field during subtree_insert and subtree_delete in $O(h)$ time. Then describe how to use it to answer: "what is the total of all strikes between 23000 and 24500?" in $O(h)$ time.
Medium
subtree_delete as implemented always terminates. Specifically, show that the recursive call in the non-leaf case always makes progress — that the node being recursed on is strictly lower in the tree than the original node.
Hard