/
ProgrammingAlgorithms & DSBalanced BSTs
Progress
7 / 13
← All Chapters
Intermediate Chapter 07 of 13 · Algorithms and Data Structures

Balanced BSTs

Chapter 6 gave us a tree where every operation costs $O(h)$. The height $h$ can be $O(n)$ in the worst case. This chapter fixes that — permanently. AVL trees keep $h = O(\log n)$ after every insertion and deletion, by performing at most $O(\log n)$ local repairs called rotations.

The AVL Property — Height Balance

The skew of a node is the height of its right subtree minus the height of its left subtree (where the height of an empty subtree is $-1$).

A node is height-balanced if its skew is $-1$, $0$, or $+1$.

An AVL tree is a binary tree in which every node is height-balanced. This is the AVL property, named after Adelson-Velsky and Landis (1962) — the first balanced BST ever proposed.

Why Height Balance Implies $O(\log n)$ Height

We need to show that an AVL tree on $n$ nodes has height $O(\log n)$. Equivalently, we show that any height-$h$ AVL tree has at least $F(h)$ nodes where $F(h) = 2^{\Omega(h)}$.

Let $F(h)$ be the minimum number of nodes in any height-$h$ AVL tree. An AVL tree of height $h$ must have a root, a subtree of height $h-1$ (the taller child), and — because the tree is height-balanced — a subtree of at least height $h-2$ (the shorter child). So:

$$F(0) = 1, \quad F(1) = 2, \quad F(h) = 1 + F(h-1) + F(h-2) \geq 2F(h-2)$$

Solving: $F(h) \geq 2^{h/2}$. Therefore an AVL tree with $n$ nodes satisfies $n \geq F(h) \geq 2^{h/2}$, which gives $h \leq 2 \log_2 n = O(\log n)$.

The recurrence $F(h) = 1 + F(h-1) + F(h-2)$ is almost identical to the Fibonacci recurrence $\text{Fib}(h) = \text{Fib}(h-1) + \text{Fib}(h-2)$. The minimally-populated AVL trees are sometimes called Fibonacci trees for this reason. The Fibonacci sequence grows exponentially — that exponential growth is exactly what guarantees logarithmic height.

Rotations — Restructuring Without Reordering

When a node becomes imbalanced (skew $\pm 2$), we need to change the tree's shape without changing its traversal order. The operation that does this is a rotation.

A rotation takes two adjacent nodes $B$ and $D$ (where $D$ is the right child of $B$, or vice versa), and swaps which one is the parent. Crucially, it relinkss only $O(1)$ pointers and preserves the traversal order exactly.

Before rotate_left(B)
     B
    /   \
   A     D
         /  \
        C    E
After rotate_left(B)
     D
    /   \
   B     E
   /  \
  A    C

Traversal order before: $A, B, C, D, E$. Traversal order after: $A, B, C, D, E$. Identical — the rotation preserves order while changing depth. $A$ and $E$ move up/down one level; $C$ switches parents; $B$ and $D$ swap their parent-child relationship.

subtree_rotate_left / right · O(1)
def subtree_rotate_left(B):     # O(1) — promote B's right child D
    """Left rotation: D becomes parent, B becomes D's left child."""
    assert B.right
    A, D  = B.left,  B.right
    C, E  = D.left,  D.right
    # swap items so B stays at same memory address (cleaner for parent pointers)
    B.item, D.item = D.item, B.item
    B, D = D, B                  # B now holds old D's item (root of subtree)
    D.left,  D.right = B, E      # D (old B node) gets B and E as children
    B.left,  B.right = A, C      # B (old D node) gets A and C as children
    if A: A.parent = B
    if E: E.parent = D
    B.subtree_update()           # update augmentations bottom-up
    D.subtree_update()

def subtree_rotate_right(D):    # O(1) — promote D's left child B
    """Right rotation: B becomes parent, D becomes B's right child."""
    assert D.left
    B, E  = D.left,  D.right
    A, C  = B.left,  B.right
    D.item, B.item = B.item, D.item
    D, B = B, D
    B.left,  B.right = A, D
    D.left,  D.right = C, E
    if A: A.parent = B
    if E: E.parent = D
    B.subtree_update()
    D.subtree_update()
// Left rotation: promote node's right child — O(1)
template<typename T>
void rotate_left(AVL_Node<T>* B) {
    assert(B->right);
    auto* D = static_cast<AVL_Node<T>*>(B->right);
    auto* A = B->left;
    auto* C = D->left;
    auto* E = D->right;
    // swap items so pointers from parent still point to "B"
    swap(B->item, D->item);
    swap(B, D);                     // B now holds D's item
    D->left  = B; D->right = E;     // D (old B) keeps its position
    B->left  = A; B->right = C;     // B (old D) holds D's old children
    if (A) A->parent = B;
    if (E) E->parent = D;
    B->update(); D->update();        // recompute height/size bottom-up
}

template<typename T>
void rotate_right(AVL_Node<T>* D) {
    assert(D->left);
    auto* B = static_cast<AVL_Node<T>*>(D->left);
    auto* A = B->left;
    auto* C = B->right;
    auto* E = D->right;
    swap(D->item, B->item);
    swap(D, B);
    B->left  = A; B->right = D;
    D->left  = C; D->right = E;
    if (A) A->parent = B;
    if (E) E->parent = D;
    B->update(); D->update();
}
↗ Run C++ on Compiler Explorer (Godbolt)

Rebalancing — Three Cases

After inserting or deleting a leaf, only ancestors of that leaf can become imbalanced. We walk from the leaf to the root, rebalancing each node with skew $\pm 2$. The key insight: at most $O(\log n)$ nodes need attention (there are only $O(\log n)$ ancestors), and each rebalancing requires at most 2 rotations.

Consider a node $B$ with skew $+2$ (right subtree is taller). Its right child is $D$. Three sub-cases based on $D$'s skew:

Case 1 — skew(D) = 0 or +1
Single left rotation on $B$. After: $D$ is the new root of this subtree. $B$'s skew becomes 0 or +1, $D$'s skew becomes $-1$. Subtree height decreases by 0 or 1.

rotate_left(B)
Case 2 — skew(D) = −1
$D$'s left child $C$ exists. Double rotation: first rotate $D$ right, then rotate $B$ left. After: $C$ becomes the new root. Both $B$ and $D$ become children of $C$. Subtree height decreases by 1.

rotate_right(D)
rotate_left(B)

The symmetric cases (skew $-2$) use right rotations instead. The complete rebalance logic:

rebalance + maintain · O(log n) per insertion/deletion
def height(A):
    """Height of node A (-1 if A is None)."""
    return A.height if A else -1

def subtree_update(A):          # O(1) — recompute augmentations
    A.height = 1 + max(height(A.left), height(A.right))
    A.size   = 1 + (A.left.size  if A.left  else 0) \
                 + (A.right.size if A.right else 0)

def skew(A):                    # O(1)
    return height(A.right) - height(A.left)

def rebalance(A):               # O(1) — fix skew ±2 using ≤ 2 rotations
    if skew(A) == 2:
        if skew(A.right) < 0:          # Case 2: double rotation
            A.right.subtree_rotate_right()
        A.subtree_rotate_left()        # Case 1 or 2: single rotation
    elif skew(A) == -2:
        if skew(A.left) > 0:           # symmetric Case 2
            A.left.subtree_rotate_left()
        A.subtree_rotate_right()       # symmetric Case 1 or 2

def maintain(A):                # O(log n) — rebalance from A up to root
    """Called after any structural change at or below A."""
    rebalance(A)                # fix A if needed (O(1))
    subtree_update(A)           # recompute A's height and size (O(1))
    if A.parent:
        A.parent.maintain()     # propagate up the tree
template<typename T>
struct AVL_Node {
    T item;
    int height = 0, sz = 1;
    AVL_Node *left = nullptr, *right = nullptr, *parent = nullptr;
    AVL_Node(T x) : item(x) {}

    void update() {                      // O(1)
        int lh = left  ? left->height  : -1;
        int rh = right ? right->height : -1;
        height = 1 + max(lh, rh);
        sz = 1 + (left  ? left->sz  : 0)
                + (right ? right->sz : 0);
    }
    int skew() const {
        int lh = left  ? left->height  : -1;
        int rh = right ? right->height : -1;
        return rh - lh;
    }

    void rebalance() {                   // O(1) — at most 2 rotations
        if (skew() == 2) {
            if (static_cast<AVL_Node*>(right)->skew() < 0)
                rotate_right(static_cast<AVL_Node*>(right));
            rotate_left(this);
        } else if (skew() == -2) {
            if (static_cast<AVL_Node*>(left)->skew() > 0)
                rotate_left(static_cast<AVL_Node*>(left));
            rotate_right(this);
        }
    }

    void maintain() {                    // O(log n)
        rebalance();
        update();
        if (parent) parent->maintain();
    }
};
↗ Run C++ on Compiler Explorer (Godbolt)

The Augmentation Framework

Every binary tree can be augmented with additional information stored at each node. The rule is clean: augment a node with any subtree property — something computable from the node's item and its two children's augmentations in $O(1)$ time.

If you can compute the property in $O(1)$ from children, you can maintain it during any rotation or leaf modification without changing the $O(\log n)$ operation cost. The subtree_update function is where augmentations live.

Size_Node · augmented with subtree size for Sequence AVL
class Size_Node(Binary_Node):
    """AVL node augmented with subtree size — enables O(log n) get_at(i)."""

    def subtree_update(self):       # O(1) — called after every structural change
        super().subtree_update()    # recompute height first
        self.size = 1
        if self.left:  self.size += self.left.size
        if self.right: self.size += self.right.size

    def subtree_at(self, i):        # O(log n) — find i-th node in traversal order
        assert 0 <= i
        L = self.left.size if self.left else 0
        if i < L:
            return self.left.subtree_at(i)
        elif i > L:
            return self.right.subtree_at(i - L - 1)
        else:
            return self                         # i == L → this is the i-th node


# With Size_Node, a Sequence AVL supports all operations in O(log n):
# get_at(i)     → subtree_at(i).item
# insert_at(i)  → subtree_at(i-1).insert_after(new_node)
# delete_at(i)  → subtree_at(i).subtree_delete()
// AVL node augmented with subtree size
// Enables O(log n) get_at(i) for Sequence interface
template<typename T>
struct Size_AVL_Node : AVL_Node<T> {
    using Base = AVL_Node<T>;
    int size = 1;

    Size_AVL_Node(T x) : Base(x) {}

    void update() override {            // O(1)
        Base::update();                 // recompute height first
        size = 1;
        auto* L = static_cast<Size_AVL_Node*>(this->left);
        auto* R = static_cast<Size_AVL_Node*>(this->right);
        if (L) size += L->size;
        if (R) size += R->size;
    }

    // Find i-th node in traversal order — O(log n)
    Size_AVL_Node* subtree_at(int i) {
        int L = this->left ? static_cast<Size_AVL_Node*>(this->left)->size : 0;
        if      (i < L) return static_cast<Size_AVL_Node*>(this->left)->subtree_at(i);
        else if (i > L) return static_cast<Size_AVL_Node*>(this->right)->subtree_at(i - L - 1);
        else            return this;
    }
};
↗ Run C++ on Compiler Explorer (Godbolt)
Worked Example · AVL Insertion with Rebalancing Keys: 10 → 20 → 30

Insert keys 10, 20, 30 into an initially empty AVL tree. Show the tree after each insertion and any rebalancing that occurs.

Insert 10 — trivial
10 (h=0, skew=0) ✓ balanced
Insert 20 — goes right of 10
10 (h=1, skew=+1) ✓ balanced
  \
  20 (h=0)
Insert 30 — goes right of 20. Now 10 has skew +2 — imbalanced!
10 (h=2, skew=+2) ← IMBALANCED
  \
  20 (h=1, skew=+1) ← Case 1 condition
    \
    30 (h=0)

Node 20 has skew +1 (Case 1). Apply single left rotation on node 10:

20 (h=1, skew=0) ✓ balanced
 /  \
10   30 (both h=0, skew=0) ✓
Result: perfectly balanced tree, height 1. One left rotation fixed the imbalance. This is the textbook "right-right" case — three consecutive right insertions produce a right-skewed chain, which a single left rotation converts to a balanced tree. The same pattern occurs every time you insert sorted keys into an AVL tree: the tree auto-corrects within $O(\log n)$ rotations.

The Complete Performance Picture

Data Structure build find(k) / get_at(i) insert / delete find_min/max find_prev/next
Hash Table $\Theta(n)^{(e)}$ $\Theta(1)^{(e)}$ $\Theta(1)^{(e)}$ $\Theta(n)$ $\Theta(n)$
Unbalanced BST $\Theta(nh)$ $\Theta(h)$ $\Theta(h)$ $\Theta(h)$ $\Theta(h)$
AVL Tree (Set) $\Theta(n \log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
AVL Tree (Sequence) $\Theta(n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$

(e) expected · Sequence AVL build is O(n) via the recursive median-split construction from Ch6.

Augmentation in Practice — The Bit-Counting Problem

The augmentation framework is general. As a final example beyond size and height: maintain a sequence of $n$ bits supporting two $O(\log n)$ operations: flip(i) and count_ones_upto(i) (number of 1-bits in the prefix up to index $i$).

Augment each node with subtree_ones — the count of 1-bits in its subtree. This is a valid subtree property: computed in $O(1)$ from children. After flipping bit $i$, update augmentations up the tree in $O(\log n)$. For count_ones_upto(i), walk down the tree accumulating ones from left subtrees along the path.

This pattern — augmented Sequence AVL for prefix operations — is used directly in practice for rank/select data structures, Fenwick trees (binary indexed trees), and segment trees. Every one of these is either an AVL-style balanced tree or can be understood as one.

Python's sortedcontainers.SortedList is implemented using a B-tree (a generalisation of AVL trees with higher branching factor, better cache performance). It supports add, discard, index (rank query), and bisect_left/right (predecessor/successor) all in $O(\log n)$. For TTT's strategy pipeline, a SortedList of active strikes with augmented volume lets you answer "what is the total open interest in the 23500–24500 range?" in $O(\log n)$ per query — which is exactly a range-sum on an augmented AVL. The theory in this chapter is not academic scaffolding — it is the mechanism inside the tools you already use.

Practice Problems
4 questions · Chapter 07
prog / dsa / ch07 / q01 ★ Conceptual

A left rotation on node $B$ (where $D$ is $B$'s right child) promotes $D$ to be the new subtree root. Which of the following is preserved by the rotation?

A
The heights of all nodes in the subtree
B
The traversal order of all nodes
C
The BST property (sorted order of keys)
D
Both B and C
Answer: D — both B and C.

Wait — re-read the options. B says "traversal order", C says "BST property (sorted order of keys)". These are actually the same thing for a BST: BST property ⟺ traversal order = sorted key order.

The correct answer is D, because a rotation preserves traversal order by design, and since BST property ⟺ traversal order being sorted, it also preserves the BST property. Heights are not preserved — that is the whole point of rotations. A left rotation decreases the height of $B$'s subtree (or keeps it the same), which is exactly why we use it to fix imbalance. Option A is wrong.

The key invariant: rotations change structure, never semantics.
prog / dsa / ch07 / q02 ★★ AVL Property

An AVL tree has $n = 100$ nodes. What is the maximum possible height of this tree?

A
7 — $\lfloor \log_2 100 \rfloor$
B
13 — AVL height bound is $\approx 1.44 \log_2 n$
C
50 — balanced means at most $n/2$ height
D
99 — AVL trees can still degenerate in worst case
Answer: B — at most 13.

From the Fibonacci recurrence analysis: any height-$h$ AVL tree has at least $F(h) \geq 2^{h/2}$ nodes. Setting $n \geq 2^{h/2}$ gives $h \leq 2 \log_2 n$. For $n = 100$: $h \leq 2 \times 6.64 \approx 13.3$, so $h \leq 13$.

More precisely, the exact bound is $h \leq \lfloor 1.44 \log_2(n+2) \rfloor - 1$. For $n = 100$: $h \leq 1.44 \times 6.66 - 1 \approx 8.6$, so $h \leq 8$ in practice. But the $2\log_2 n$ bound is the clean theoretical guarantee. Option A ($\log_2 n$) would be a perfect binary tree — AVL trees can be slightly taller. Options C and D contradict the AVL height proof.
prog / dsa / ch07 / q03 ★★ Rebalancing

After inserting a node into an AVL tree, node $B$ has skew $+2$ and its right child $D$ has skew $-1$. How many rotations are needed to restore balance, and what are they?

A
1 rotation — left rotation on $B$
B
1 rotation — right rotation on $D$
C
2 rotations — right rotation on $D$, then left rotation on $B$
D
2 rotations — left rotation on $B$, then right rotation on $D$
Answer: C — right rotation on $D$, then left rotation on $B$.

This is Case 2 (skew($D$) = $-1$). The double rotation pattern:
1. Right rotation on $D$ — this converts the configuration to Case 1 (makes $D$'s skew become $+1$ or $0$).
2. Left rotation on $B$ — now a standard single left rotation restores balance.

The order matters: rotate the child first, then the imbalanced node. Doing it in reverse order (left on $B$ first) would not fix the imbalance — it would just move the problem. This "zigzag" pattern (right child has left-heavy subtree) always requires two rotations. The "straight-line" pattern (Case 1, skew($D$) ≥ 0) requires only one.
prog / dsa / ch07 / q04 ★★★ Augmentation

You want to augment an AVL tree so that subtree_sum(A) returns the sum of all keys in $A$'s subtree in $O(1)$. Which statement is correct?

A
This is impossible — sum requires visiting all nodes in the subtree
B
Store the sum but recompute from scratch after every rotation — $O(n)$ per operation
C
Store the sum at each node; update in $O(1)$ as A.sum = A.key + left.sum + right.sum; maintained during rotations via subtree_update
D
Augmentation only works for properties based on subtree size, not arbitrary sums
Answer: C.

Subtree sum is a valid subtree property: it depends only on the item at the node and the sums at its two children. The formula A.sum = A.key + (A.left.sum if A.left else 0) + (A.right.sum if A.right else 0) computes it in $O(1)$ from children's stored values.

By the augmentation theorem, any property computable in $O(1)$ from children can be maintained during rotations (which call subtree_update on the two nodes involved) and during leaf insertion/deletion (which calls maintain walking up the tree). Total augmentation maintenance cost: $O(\log n)$ per operation, matching the AVL cost. No extra asymptotic cost.

This sum augmentation enables $O(\log n)$ range-sum queries: "total of all keys between $k_1$ and $k_2$" — navigate to both endpoints and combine subtree sums.

Terminal Questions — Chapter 07 10 problems · No answers given

Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 careful argument.

1
Insert keys $[3, 2, 1]$ into an empty AVL tree one at a time. After each insertion, show the tree with skew values at each node and identify which rebalancing (if any) is triggered. What is the final tree? Easy
2
Insert keys $[10, 20, 15]$ into an empty AVL tree. After inserting 15, identify the imbalanced node, its skew, its child's skew, and which case (1 or 2) applies. Perform the required rotations and show the final balanced tree. Easy
3
What is the minimum number of nodes in an AVL tree of height 4? Draw the tree. What is the minimum number for height 5? Easy
4
Prove that a left rotation on node $B$ takes $O(1)$ time and modifies exactly 6 pointer fields. List each pointer that changes and its new value. Then prove that the traversal order is preserved — specifically, show that the in-order sequence of the subtree is identical before and after the rotation. Medium
5
The maintain function walks from a modified node up to the root calling rebalance and subtree_update at each ancestor. Prove that this takes $O(\log n)$ total time by: (a) bounding the number of ancestors, and (b) showing each call is $O(1)$. Medium
6
Augment an AVL tree with a subtree_max field — the maximum key in the subtree. (a) Show how to compute it in $O(1)$ from children. (b) Describe how to use it to answer "does any key in the range $[L, R]$ exist?" in $O(\log n)$ time. Medium
7
The bit-counting problem from the chapter: implement count_ones_upto(i) in full Python using a Sequence AVL augmented with subtree_ones. Test it on a 10-bit sequence with several flips. Verify correctness against a brute-force prefix sum. Medium
8
Prove that the minimum number of nodes $F(h)$ in a height-$h$ AVL tree satisfies $F(h) \geq 2^{h/2}$. Use strong induction with the recurrence $F(h) = 1 + F(h-1) + F(h-2) \geq 2F(h-2)$. Conclude that any AVL tree on $n$ nodes has height $h \leq 2\log_2 n$. Hard
9
Why does insertion into an AVL tree require at most 1 or 2 rotations to rebalance, while deletion may require up to $O(\log n)$ rotations? Trace through what happens when a deletion decreases the height of a subtree and how that propagates up the tree. Hard
10
Design a data structure for NIFTY option analytics supporting: (a) insert new option record with key (strike, expiry) in $O(\log n)$; (b) delete in $O(\log n)$; (c) find all options with strike in $[23000, 24500]$ and return their total open interest in $O(\log n + k)$ where $k$ is the number of results. Describe the augmentation, the range query algorithm, and justify all complexities. Hard