Intermediate Chapter 06 of 13 · Algorithms and Data Structures

Binary Trees

Arrays give fast random access but slow insertion. Linked lists give fast insertion but slow access. The goal all along has been $O(\log n)$ for everything. Binary trees are the structure that makes this possible — if we keep them balanced.

The Gap We Are Trying to Close

Looking back at what we have built: arrays give $O(1)$ random access but $O(n)$ insertion. Linked lists give $O(1)$ front insertion but $O(n)$ access. Hash tables give $O(1)$ expected find but $O(n)$ for any order operation. Sorted arrays give $O(\log n)$ find but $O(n)$ insertion.

The unmet goal for both the Sequence and Set interfaces is $O(\log n)$ for every operation. Binary trees are the data structure built for exactly this. The key insight: if we can keep the tree's height $h = O(\log n)$, and every operation costs $O(h)$, then every operation costs $O(\log n)$.

The Node — Four Pointers

A binary node is a container with four fields:

node.item — the data stored at this node
node.parent — pointer to the parent node (None for the root)
node.left — pointer to the left child (None if absent)
node.right — pointer to the right child (None if absent)

A binary tree is a set of nodes connected by these pointers, with one designated root (the node with no parent). A node with no children is a leaf.

Binary_Node · four-pointer node

class Binary_Node:
    def __init__(self, x):      # O(1)
        self.item   = x
        self.left   = None      # left child
        self.right  = None      # right child
        self.parent = None      # parent node

    # ── TRAVERSAL ─────────────────────────────────────────
    def subtree_iter(self):     # O(n) — in-order traversal
        """Yield all nodes in this subtree in traversal order."""
        if self.left:
            yield from self.left.subtree_iter()
        yield self
        if self.right:
            yield from self.right.subtree_iter()

    # ── NAVIGATION ────────────────────────────────────────
    def subtree_first(self):    # O(h) — walk left until leaf
        """Return the first node in this subtree's traversal order."""
        if self.left:
            return self.left.subtree_first()
        return self

    def subtree_last(self):     # O(h) — walk right until leaf
        if self.right:
            return self.right.subtree_last()
        return self

    def successor(self):        # O(h) — next in traversal order
        """Return the node immediately after self in traversal order."""
        if self.right:
            return self.right.subtree_first()    # case 1: has right child
        # case 2: walk up until we come from a left subtree
        node = self
        while node.parent and (node is node.parent.right):
            node = node.parent
        return node.parent                       # None if self is last

    def predecessor(self):      # O(h) — previous in traversal order
        if self.left:
            return self.left.subtree_last()
        node = self
        while node.parent and (node is node.parent.left):
            node = node.parent
        return node.parent

#include <functional>
using namespace std;

template<typename T>
struct Binary_Node {
    T item;
    Binary_Node* left   = nullptr;
    Binary_Node* right  = nullptr;
    Binary_Node* parent = nullptr;

    Binary_Node(T x) : item(x) {}

    // In-order traversal — O(n)
    void subtree_iter(function<void(Binary_Node*)> visit) {
        if (left)  left->subtree_iter(visit);
        visit(this);
        if (right) right->subtree_iter(visit);
    }

    // Walk left until leaf — O(h)
    Binary_Node* subtree_first() {
        return left ? left->subtree_first() : this;
    }
    Binary_Node* subtree_last() {
        return right ? right->subtree_last() : this;
    }

    // Next in traversal order — O(h)
    Binary_Node* successor() {
        if (right) return right->subtree_first();
        auto* node = this;
        while (node->parent && node == node->parent->right)
            node = node->parent;
        return node->parent;   // nullptr if this is last
    }
    Binary_Node* predecessor() {
        if (left) return left->subtree_last();
        auto* node = this;
        while (node->parent && node == node->parent->left)
            node = node->parent;
        return node->parent;
    }
};

↗ Run C++ on Compiler Explorer (Godbolt)

Terminology — Depth, Height, and Why They Matter

For a node $X$ in a tree rooted at $R$:

Depth of $X$ — length of the path from $X$ to the root $R$. The root has depth 0.
Height of $X$ — maximum depth of any node in the subtree rooted at $X$. A leaf has height 0.
Height of the tree — height of the root = maximum depth of any node.

Every operation we will define on binary trees runs in $O(h)$ time, where $h$ is the height. A tree with $n$ nodes can have height anywhere from $\lfloor \log_2 n \rfloor$ (perfectly balanced) to $n-1$ (a chain — effectively a linked list). The entire enterprise of Chapters 6 and 7 is ensuring $h = O(\log n)$.

Example tree — 6 nodes, height 3

A ← root, depth 0, height 3
    /   \
  B     C      ← depth 1; C is a leaf (height 0)
  /  \
D    E            ← depth 2
/
F                    ← leaf, depth 3, height 0

Traversal order (in-order): F, D, B, E, A, C

Traversal Order — The Central Concept

Every binary tree has a natural traversal order defined recursively: every node in a node's left subtree comes before it, and every node in the right subtree comes after it. The subtree_iter function above implements this — it is precisely in-order traversal.

Right now, this order has no semantic meaning. We will assign two different meanings to it:

Sequence interpretation: traversal order = sequence order. The $i$-th node in traversal order is the $i$-th item in the sequence. This gives us $O(h)$ get_at(i) and insert_at(i, x).

Set interpretation: traversal order = sorted order by key. This is the Binary Search Tree (BST) property. It gives us $O(h)$ find(k), find_next(k), find_prev(k).

Both interpretations share the same underlying node structure. Only the semantic meaning changes.

Tree Navigation — Four Operations, All O(h)

Four navigation operations form the backbone of everything we build on top of binary trees. Each runs in $O(h)$ by walking up or down the tree:

subtree_first(X) — walk left until you hit a node with no left child. That is the first node in $X$'s subtree. To find the minimum key in a BST: call subtree_first(root).

subtree_last(X) — symmetric. Walk right to find the last node.

successor(X) — two cases. If $X$ has a right child, the successor is the first node in the right subtree. If not, walk up until you come from a left child — that ancestor is the successor.

predecessor(X) — symmetric to successor.

The successor algorithm captures something geometrically obvious: the next item in traversal order is either the leftmost item in the right subtree (if it exists), or the first ancestor you reach by climbing up while you are coming from the right. Draw it on the example tree: the successor of E is A — E has no right child, so we climb up to B (came from right), then to A (came from left) — return A. Correct.

Dynamic Operations — Inserting and Deleting Nodes

The tree only changes by adding or removing leaves — never internal nodes directly. This keeps the logic manageable. All dynamic operations run in $O(h)$.

Insert node $Y$ after node $X$ in traversal order: If $X$ has no right child, attach $Y$ as the right child. Otherwise, $X$'s successor has no left child (by definition of successor) — attach $Y$ as its left child.

Delete node $X$: If $X$ is a leaf, detach from parent. If not, swap $X$'s item with its predecessor (or successor if no predecessor), then recursively delete the node now holding $X$'s original item — which is one step closer to being a leaf. This recursion terminates because we always move down.

Dynamic operations · insert and delete

    def subtree_insert_before(self, B):   # O(h) — insert B before self
        if self.left:
            # self has a left child → insert B as rightmost node
            # of the left subtree (which has no right child)
            node = self.left.subtree_last()
            node.right  = B
            B.parent    = node
        else:
            # self has no left child → B becomes left child
            self.left   = B
            B.parent    = self

    def subtree_insert_after(self, B):    # O(h) — insert B after self
        if self.right:
            node = self.right.subtree_first()
            node.left   = B
            B.parent    = node
        else:
            self.right  = B
            B.parent    = self

    def subtree_delete(self):             # O(h)
        """Remove self from tree. Returns the node physically deleted (a leaf)."""
        if self.left or self.right:       # not a leaf — swap down first
            # choose predecessor if left child exists, else successor
            B = self.predecessor() if self.left else self.successor()
            self.item, B.item = B.item, self.item  # swap items
            return B.subtree_delete()               # now recurse on B
        # self is a leaf — detach from parent
        if self.parent:
            if self.parent.left is self:
                self.parent.left  = None
            else:
                self.parent.right = None
        return self

    void insert_before(Binary_Node* B) {  // O(h)
        if (left) {
            auto* node = left->subtree_last();
            node->right = B;
            B->parent   = node;
        } else {
            left     = B;
            B->parent = this;
        }
    }

    void insert_after(Binary_Node* B) {   // O(h)
        if (right) {
            auto* node = right->subtree_first();
            node->left = B;
            B->parent  = node;
        } else {
            right    = B;
            B->parent = this;
        }
    }

    Binary_Node* subtree_delete() {       // O(h)
        if (left || right) {
            auto* B = left ? predecessor() : successor();
            swap(item, B->item);          // swap items, not pointers
            return B->subtree_delete();   // recurse — always moves down
        }
        // self is a leaf
        if (parent) {
            if (parent->left  == this) parent->left  = nullptr;
            else                       parent->right = nullptr;
        }
        return this;
    }

↗ Run C++ on Compiler Explorer (Godbolt)

Application: Binary Search Tree (Set Interface)

Assign the Set semantic to traversal order: the traversal order is sorted by key. This is the BST property: for every node, every key in the left subtree is smaller, every key in the right subtree is larger.

Finding key $k$ in a BST is like binary search, but on the tree structure instead of an array. At each node: if $k$ is smaller, recurse left; if larger, recurse right; if equal, found. Cost: $O(h)$.

BST_Node · Binary Search Tree

class BST_Node(Binary_Node):
    """A Binary_Node where traversal order = sorted order by key."""

    def subtree_find(self, k):          # O(h)
        """Find node with key k in this subtree. Returns None if absent."""
        if k < self.item.key:
            return self.left.subtree_find(k)  if self.left  else None
        elif k > self.item.key:
            return self.right.subtree_find(k) if self.right else None
        else:
            return self                        # found

    def subtree_find_next(self, k):     # O(h) — smallest key > k
        if self.item.key <= k:
            return self.right.subtree_find_next(k) if self.right else None
        # self.item.key > k: check if left subtree has something better
        B = self.left.subtree_find_next(k) if self.left else None
        return B if B else self

    def subtree_find_prev(self, k):     # O(h) — largest key < k
        if self.item.key >= k:
            return self.left.subtree_find_prev(k) if self.left else None
        B = self.right.subtree_find_prev(k) if self.right else None
        return B if B else self

    def subtree_insert(self, B):        # O(h) — insert BST_Node B
        """Insert B maintaining BST property."""
        if B.item.key < self.item.key:
            if self.left:  self.left.subtree_insert(B)
            else:          self.subtree_insert_before(B)
        elif B.item.key > self.item.key:
            if self.right: self.right.subtree_insert(B)
            else:          self.subtree_insert_after(B)
        else:
            self.item = B.item           # duplicate key: update in place

template<typename T, typename K>
struct BST_Node : Binary_Node<T> {
    using Base = Binary_Node<T>;
    BST_Node(T x) : Base(x) {}

    // Find node with key k — O(h)
    BST_Node* subtree_find(K k) {
        if (k < this->item.key)
            return this->left  ? static_cast<BST_Node*>(this->left)->subtree_find(k)  : nullptr;
        if (k > this->item.key)
            return this->right ? static_cast<BST_Node*>(this->right)->subtree_find(k) : nullptr;
        return this;
    }

    // Insert BST_Node B maintaining sorted order — O(h)
    void subtree_insert(BST_Node* B) {
        if (B->item.key < this->item.key) {
            if (this->left)  static_cast<BST_Node*>(this->left)->subtree_insert(B);
            else             this->insert_before(B);
        } else if (B->item.key > this->item.key) {
            if (this->right) static_cast<BST_Node*>(this->right)->subtree_insert(B);
            else             this->insert_after(B);
        } else {
            this->item = B->item;  // update duplicate
        }
    }
};

↗ Run C++ on Compiler Explorer (Godbolt)

Application: Sequence Binary Tree (Augmentation)

For the Sequence interface, traversal order = sequence order. The key operation is get_at(i) — find the $i$-th node in traversal order. Naively, iterate through the tree: $O(n)$. Better: augment each node with the size of its subtree.

With subtree sizes, get_at(i) becomes $O(h)$: check the left subtree size $n_L$. If $i < n_L$, recurse left. If $i > n_L$, recurse right with $i' = i - n_L - 1$. If $i = n_L$, return this node.

Maintaining subtree sizes costs $O(h)$ per insert or delete — just update all ancestors on the path to the root. This is the general principle of augmentation: add a field to each node, maintain it during modifications, and exploit it to answer queries faster.

Augmentation is the bridge from a generic binary tree to a useful one. The size field turns the tree into a rank-queryable structure — "give me the 5th item" — in $O(h)$. Other augmentations give you other powers: a max-key field enables range-max queries; a sum field enables range-sum queries. Every augmented BST in finance (for example, order book implementations that need rank statistics) uses this pattern. We will see more augmentations in Chapter 7.

Worked Example · BST Operations on Strike Prices 🇮🇳 NIFTY Options

Build a BST from NIFTY strike prices $[24000, 23500, 24500, 23000, 24200]$ inserted in that order. Then: (a) find the successor of 23500; (b) find the node with key 24200; (c) find the smallest key above 24100.

Build the BST — insert in order

Insert 24000 → root. Insert 23500 → left of 24000 (23500 < 24000). Insert 24500 → right of 24000. Insert 23000 → left of 23500. Insert 24200 → left of 24500 (24200 < 24500).

                  24000

                 /     \

              23500   24500

              /       /

            23000  24200

(a) Successor of 23500

23500 has no right child. Walk up: 23500 is a left child of 24000. Return 24000. Answer: 24000.

(b) Find key 24200

At 24000: 24200 > 24000, go right → at 24500: 24200 < 24500, go left → at 24200: found. Cost: $O(h) = O(3)$ steps. Answer: node at 24200.

At 24000: 24000 ≤ 24100, go right → at 24500: 24500 > 24100, check left → at 24200: 24200 > 24100, check left → None. Return 24200 (better than 24500). Answer: 24200.

Trading application: this is exactly the operation an options strategy needs — given a target delta, find the nearest strike above or below it. A BST on strike prices answers find_next and find_prev in $O(\log n)$ time per query. With 81 NIFTY weekly strikes and many queries per session, this is optimal.

Why $O(h)$ Is Not Yet $O(\log n)$

Everything above runs in $O(h)$. But we have not guaranteed $h = O(\log n)$. A BST built by inserting already-sorted keys degenerates into a linked list: height $n-1$, all operations $O(n)$.

Degenerate BST — keys inserted in sorted order

23000
   \
23500
     \
     24000    ← height = n-1 = 4, all ops O(n)
         \
         24200
             \
             24500

This is the fundamental problem that Chapter 7 solves. AVL trees maintain a balance invariant — the heights of any node's left and right subtrees differ by at most 1. This guarantees $h = O(\log n)$ always, even after arbitrary insertions and deletions.

The Performance Picture So Far

Data Structure	build	find(k)	insert/delete	find_min/max	find_prev/next
Sorted Array	$\Theta(n \log n)$	$\Theta(\log n)$	$\Theta(n)$	$\Theta(1)$	$\Theta(\log n)$
Hash Table	$\Theta(n)^{(e)}$	$\Theta(1)^{(e)}$	$\Theta(1)^{(e)}$	$\Theta(n)$	$\Theta(n)$
Binary Tree (BST)	$\Theta(nh)$	$\Theta(h)$	$\Theta(h)$	$\Theta(h)$	$\Theta(h)$
Balanced BST (goal)	$\Theta(n \log n)$	$\Theta(\log n)$	$\Theta(\log n)$	$\Theta(\log n)$	$\Theta(\log n)$

(e) expected · (h) height of tree, can be O(n) without balance guarantee

Python's standard library does not include a balanced BST. The sortedcontainers library (not in stdlib, installable via pip) provides SortedList and SortedDict with $O(\log n)$ operations — implemented using a B-tree variant. For trading applications requiring fast in-order operations (find the next strike, find the nearest expiry, iterate strikes in order), SortedList is the right tool. Understanding the binary tree structure underneath is what lets you reason about when to use it, when to use a dict instead, and what the actual cost of each operation is.

Practice Problems

4 questions · Chapter 06

prog / dsa / ch06 / q01 ★ Conceptual

In a BST, what is the traversal order of the tree if we insert keys $[5, 3, 7, 1, 4]$ in that order? And what is the traversal order output of subtree_iter on the root?

$[5, 3, 7, 1, 4]$ — insertion order is traversal order in BSTs

$[1, 3, 4, 5, 7]$ — traversal order equals sorted order in a BST

$[5, 3, 1, 4, 7]$ — pre-order (root before children)

$[1, 4, 3, 7, 5]$ — post-order (children before root)

Answer: B.

The BST property ensures that traversal order (left subtree, self, right subtree) always produces keys in sorted ascending order. Regardless of insertion order, a correct BST's subtree_iter always yields keys sorted — this is the defining property.

The resulting tree: 5 is root, 3 is left child of 5, 7 is right child, 1 is left child of 3, 4 is right child of 3. Traversal: left(5)'s subtree = [1,3,4], then 5, then right(5)'s subtree = [7]. Output: $[1,3,4,5,7]$.

prog / dsa / ch06 / q02 ★★ Navigation

In the BST built from $[24000, 23500, 24500, 23000, 24200]$ (from the worked example), what is the predecessor of node 24000?

23000 — the smallest key in the left subtree

23500 — the largest key in the left subtree

24200 — the right child of the left subtree

None — 24000 is the root so it has no predecessor

Answer: B — 23500.

The predecessor of a node $X$ is the largest key smaller than $X$'s key — equivalently, the last node in the left subtree's traversal order. Since 24000 has a left child (23500), we call subtree_last on the left subtree rooted at 23500. Walk right from 23500 — it has no right child — so return 23500 itself.

This makes intuitive sense: in sorted order $[23000, 23500, 24000, 24200, 24500]$, the element immediately before 24000 is 23500. The BST predecessor operation finds this in $O(h)$ time.

prog / dsa / ch06 / q03 ★★ Complexity

You build a BST by inserting NIFTY expiry dates as integer keys, always in chronological order (smallest first). What is the height of the resulting BST, and what does this mean for operations?

$O(\log n)$ — BSTs always stay balanced

$O(n)$ — sorted insertion creates a right-skewed chain, all operations degrade to $O(n)$

$O(\sqrt{n})$ — partial balance from the BST property

$O(\log n)$ — BST insertion is always comparable to binary search

Answer: B — $O(n)$, a right-skewed chain.

When keys are inserted in sorted order, each new key is larger than all existing keys, so it always goes to the rightmost position. The tree grows as a chain: $k_1 \to k_2 \to k_3 \to \cdots$, each node the right child of the previous. Height $= n - 1$.

All operations become $\Theta(n)$ — as slow as a linked list. This is why naively building a BST from sorted data is dangerous. The fix (Chapter 7): use an AVL tree, which maintains height $O(\log n)$ through rotations after every insertion. Alternatively: shuffle the keys before insertion to randomise the BST's shape (expected height $O(\log n)$).

prog / dsa / ch06 / q04 ★★★ Augmentation

A sequence binary tree is augmented with subtree sizes. Node A has left.size = 3, right.size = 2, and stores one item itself. A call to subtree_at(4) (find the 4th item, 0-indexed) is made at A. What happens?

Return the item at A — index 4 equals $n_L = 4$

Recurse into the left subtree with index 4

Recurse into the right subtree with index $4 - 3 - 1 = 0$

Return None — index 4 is out of bounds for a 6-node tree

Answer: C.

The subtree at A has $n_L = 3$ (left), 1 (self), $n_R = 2$ (right) = 6 total nodes. We want index 4 (0-based):
— Index 4 > $n_L = 3$: the 4th item is not in the left subtree.
— Index 4 = $n_L$? No: 4 ≠ 3, so it is not A itself.
— Index 4 > $n_L$: recurse into the right subtree with adjusted index $i' = i - n_L - 1 = 4 - 3 - 1 = 0$.

So we look for the 0th item in the right subtree — its first/leftmost node. The total tree has 6 nodes (index 0 through 5). In sorted order: indices 0,1,2 are in the left subtree; index 3 is A; indices 4,5 are in the right subtree. Index 4 → right subtree index 0. Correct.

Terminal Questions — Chapter 06 10 problems · No answers given

Work through independently. Q1–3 are direct applications. Q4–7 require combining ideas. Q8–10 demand careful argument.

Build a BST by inserting keys $[10, 5, 15, 3, 7, 12, 20]$ in order. Draw the resulting tree. List all nodes in traversal order. Identify the height of the tree. Easy

On the BST from Q1: (a) find successor(7); (b) find predecessor(12); (c) find subtree_find_next(6) — the smallest key greater than 6. Trace each operation step by step. Easy

Delete key 10 (the root) from the BST in Q1. Show the tree after deletion, step by step. Which node becomes the new root? Easy

Prove that subtree_iter runs in $\Theta(n)$ time by showing that each edge of the tree is traversed exactly twice (once going down, once coming back up). Use this to argue that iterating a binary tree using subtree_first + repeated successor calls also runs in $\Theta(n)$ total. Medium

Describe an $O(n)$-time algorithm to build a balanced binary tree from a sorted array $A$ of $n$ items. The resulting tree should have height $O(\log n)$. Prove both the height bound and the $O(n)$ runtime using a recurrence. Medium

A BST stores NIFTY option records keyed by (strike, expiry) pairs, sorted lexicographically. Describe how to modify subtree_find, subtree_find_next, and subtree_insert to handle composite keys. What comparison operation does each node perform? Medium

Augment a BST with a subtree_sum field at each node — the sum of all keys in the subtree. Show how to maintain this field during subtree_insert and subtree_delete in $O(h)$ time. Then describe how to use it to answer: "what is the total of all strikes between 23000 and 24500?" in $O(h)$ time. Medium

What is the maximum number of nodes in a binary tree of height $h$? What is the minimum? Use these to prove that a binary tree with $n$ nodes has height between $\lfloor \log_2 n \rfloor$ and $n - 1$. When does each bound occur? Hard

Prove that subtree_delete as implemented always terminates. Specifically, show that the recursive call in the non-leaf case always makes progress — that the node being recursed on is strictly lower in the tree than the original node. Hard

Design a data structure for the following problem: maintain a dynamic set of NIFTY option strikes supporting (a) insert a new strike in $O(\log n)$; (b) delete a strike in $O(\log n)$; (c) given a target price $p$, find the strike immediately below and immediately above $p$ in $O(\log n)$. Describe the structure, operations, and justify all complexities. What Python library class implements this? Hard

← Sorting II Next: Balanced BSTs →