Binary Heaps
Not every problem needs the full power of a BST. When all you care about is quickly finding and removing the maximum (or minimum), a binary heap delivers $O(\log n)$ insert and delete-max in a dead-simple array — no pointers, no rotations, and in-place.
The Priority Queue Interface
A priority queue stores items with keys and is optimised for one thing: finding and removing the item with the highest (or lowest) priority. The interface is a deliberate subset of the full Set interface:
build(X) — build from iterable $X$insert(x) — add item $x$find_max() — return item with largest keydelete_max() — remove and return item with largest key
Only max or min — not both simultaneously (without two separate heaps). Build can be implemented as $n$ repeated inserts; find_max can be recovered from delete_max.
Three Priority Queues, Three Sorting Algorithms
Any priority queue translates directly into a sorting algorithm: build the queue, then repeatedly call delete_max to extract items in decreasing order. The choice of underlying data structure determines the sort:
| Data Structure | insert | delete_max | Sort Total | In-place? | Equivalent Sort |
|---|---|---|---|---|---|
| Array (unordered) | $O(1)^{(a)}$ | $O(n)$ | $O(n^2)$ | Yes | Selection Sort |
| Sorted Array | $O(n)$ | $O(1)^{(a)}$ | $O(n^2)$ | Yes | Insertion Sort |
| AVL Tree (Set) | $O(\log n)$ | $O(\log n)$ | $O(n \log n)$ | No | AVL Sort |
| Binary Heap | $O(\log n)^{(a)}$ | $O(\log n)$ | $O(n \log n)$ | Yes | Heap Sort |
(a) amortised
The binary heap achieves AVL-tree-level sort performance while being in-place — using only $O(1)$ extra space beyond the input array. This is the key advantage over AVL trees for sorting.
Array as a Complete Binary Tree
A binary heap is stored in a plain array — no pointers needed. The trick is reading the array as a complete binary tree: fill levels top-to-bottom, left-to-right.
/ \
7 6 ← left(0)=1, right(0)=2
/ \ / \
4 5 2 3 ← left(1)=3, right(1)=4, ...
/
1 ← leaf at index 7
The parent-child relationships are encoded entirely in index arithmetic — no pointers stored anywhere:
A complete binary tree on $n$ nodes has height $\lfloor \log_2 n \rfloor$ — always balanced, always $O(\log n)$ tall. This is the source of the heap's performance guarantee.
The Max-Heap Property
Consequence: the root $Q[0]$ is always the maximum element. Proof by induction on the path from any node $j$ up to the root: each step to the parent is non-decreasing, so the root $\geq$ every node.
Insert — Heapify Up
Append the new item to the end of the array (next leaf in reading order). The heap property holds everywhere except possibly at this new node — it might be larger than its parent. Fix it by swapping with the parent and repeating upward until the property is restored.
def parent(i):
return (i - 1) // 2 if i > 0 else i
def max_heapify_up(A, n, i):
"""Restore max-heap property from node i upward. O(log n)."""
p = parent(i)
if A[p].key < A[i].key: # parent is smaller — swap
A[i], A[p] = A[p], A[i]
max_heapify_up(A, n, p) # recurse on parent
class MaxHeap:
def __init__(self):
self.A = []
def insert(self, x): # O(log n) amortised
self.A.append(x) # place at end (next leaf)
max_heapify_up(self.A, len(self.A), len(self.A) - 1)
def find_max(self): # O(1)
return self.A[0] if self.A else None
#include <vector>
using namespace std;
int parent(int i) { return i > 0 ? (i - 1) / 2 : i; }
int left(int i) { return 2 * i + 1; }
int right(int i) { return 2 * i + 2; }
template<typename T>
void heapify_up(vector<T>& A, int i) {
int p = parent(i);
if (i > 0 && A[p] < A[i]) { // parent is smaller — swap
swap(A[i], A[p]);
heapify_up(A, p); // recurse on parent
}
}
template<typename T>
struct MaxHeap {
vector<T> A;
void insert(T x) { // O(log n) amortised
A.push_back(x);
heapify_up(A, A.size() - 1);
}
T find_max() { return A[0]; } // O(1)
};
Correctness: appending creates one potential violation at node $i$. If $Q[i] > Q[\text{parent}(i)]$, we swap — now the parent satisfies the heap property with all its descendants (the old subtree is unchanged), and the same potential violation may have moved one level up. We recurse. When we reach the root or a node where the property holds, we are done. Each step moves strictly upward, so termination is guaranteed.
Runtime: at most $h = \lfloor \log_2 n \rfloor$ swaps — one per level of the tree. $\Theta(\log n)$.
Delete Max — Heapify Down
The maximum is always at $Q[0]$. We cannot simply remove it and leave a gap. Instead: swap the root with the last element, remove the last element (now holding the old max), then fix the heap property downward from the new root.
The new root may be smaller than one or both children. Fix it by swapping with the larger child (to preserve the heap property for the non-swapped child), then recurse downward.
def left(i, n): return 2*i+1 if 2*i+1 < n else i
def right(i, n): return 2*i+2 if 2*i+2 < n else i
def max_heapify_down(A, n, i):
"""Restore max-heap property from node i downward. O(log n)."""
l, r = left(i, n), right(i, n)
# find index of the largest among node i and its children
c = l if A[r].key < A[l].key else r # larger child
if A[i].key < A[c].key: # i is smaller than a child
A[i], A[c] = A[c], A[i]
max_heapify_down(A, n, c) # recurse on the child we swapped
def delete_max(self): # O(log n)
n = len(self.A)
self.A[0], self.A[n-1] = self.A[n-1], self.A[0] # swap root with last
max_val = self.A.pop() # remove last (old root = max)
max_heapify_down(self.A, len(self.A), 0) # fix heap from root
return max_val
template<typename T>
void heapify_down(vector<T>& A, int n, int i) {
int l = left(i), r = right(i);
// find larger child (stay in bounds)
int c = (r < n && A[l] < A[r]) ? r : (l < n ? l : i);
if (c != i && A[i] < A[c]) { // i is smaller than its child
swap(A[i], A[c]);
heapify_down(A, n, c); // recurse on the swapped child
}
}
template<typename T>
T MaxHeap<T>::delete_max() { // O(log n)
T mx = A[0];
swap(A[0], A.back()); // move last element to root
A.pop_back(); // remove old max
if (!A.empty())
heapify_down(A, A.size(), 0); // restore heap property
return mx;
}
$O(n)$ Build Heap — The Surprising Result
Inserting $n$ items one by one (heapify_up) costs $\sum_{i=0}^{n-1} \log i = \log(n!) = \Omega(n \log n)$. But if we have all $n$ items available at once, we can do better.
Key observation: the bottom half of the array are all leaves — they trivially satisfy the heap property (no children to violate it). Only the top half needs fixing.
Run max_heapify_down from index $\lfloor n/2 \rfloor - 1$ down to $0$. Each call costs $O(\text{height of node } i)$. Nodes near the leaves have small height; the total cost telescopes to $O(n)$:
def build_max_heap(A):
"""Convert arbitrary array A into a max-heap in O(n) time."""
n = len(A)
# Start from last non-leaf (index n//2 - 1) and go up to root
for i in range(n // 2 - 1, -1, -1): # O(n) — see proof above
max_heapify_down(A, n, i) # O(height(i)) per call
def heap_sort(A):
"""Sort A in-place in O(n log n). Uses O(1) extra space."""
n = len(A)
build_max_heap(A) # O(n) — build phase
for i in range(n - 1, 0, -1): # O(n log n) — extract phase
A[0], A[i] = A[i], A[0] # move current max to sorted end
max_heapify_down(A, i, 0) # heap shrinks by 1 each iteration
# Trace on [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]:
# After build_max_heap: [16, 14, 10, 8, 7, 9, 3, 2, 4, 1] ← valid max-heap
# After extract phase: [1, 2, 3, 4, 7, 8, 9, 10, 14, 16] ← sorted
#include <vector>
#include <algorithm>
using namespace std;
template<typename T>
void build_max_heap(vector<T>& A) { // O(n)
int n = A.size();
for (int i = n/2 - 1; i >= 0; --i)
heapify_down(A, n, i);
}
template<typename T>
void heap_sort(vector<T>& A) { // O(n log n), in-place
build_max_heap(A); // O(n) — arrange as max-heap
for (int i = A.size()-1; i > 0; --i) {
swap(A[0], A[i]); // move max to sorted end
heapify_down(A, i, 0); // shrink heap and restore
}
// A is now sorted in ascending order
}
// Note: std::sort_heap does exactly this after std::make_heap
An automated strategy receives option orders with priority scores (higher = more urgent to fill). At each tick, it must: (a) add new orders as they arrive, and (b) always fill the highest-priority order first. Design the data structure and trace an example.
Max-heap keyed on priority score. Each element is an (priority, order_id, order_details) tuple. Python's heapq module is a min-heap, so negate priorities to simulate a max-heap.
import heapq
orders = [] # min-heap on negative priority = max-heap on priority
# Insert 5 orders (priority, order_id, side, strike)
heapq.heappush(orders, (-85, 1, 'buy', 24000))
heapq.heappush(orders, (-92, 2, 'sell', 24500))
heapq.heappush(orders, (-71, 3, 'buy', 23500))
heapq.heappush(orders, (-98, 4, 'sell', 24000)) # highest priority
heapq.heappush(orders, (-88, 5, 'buy', 24200))
# Extract top 3 — each O(log n)
for _ in range(3):
neg_p, oid, side, strike = heapq.heappop(orders)
print(f"Fill order {oid}: {side} {strike} (priority {-neg_p})")
# Output:
# Fill order 4: sell 24000 (priority 98)
# Fill order 2: sell 24500 (priority 92)
# Fill order 5: buy 24200 (priority 88)
heapq uses the array-based binary heap exactly as described in this chapter. The heap never needs to be fully sorted — only the maximum needs to be accessible. This is strictly more efficient than maintaining a sorted list when insertions and deletions are interleaved.
heapq module is a direct implementation of the array-based binary heap from this chapter — heappush is heapify_up, heappop is heapify_down, and heapify (called on an existing list) is the $O(n)$ build_max_heap. For TTT's strategy systems, the pattern heapq.heappush / heapq.heappop is the right tool any time you need a dynamic "best item so far" — best-priced order, closest expiry, highest IV strike, etc. The key limitation to keep in mind: heapq provides no $O(\log n)$ arbitrary key lookup or deletion. If you need to cancel a specific order from the heap, you need a separate hash map from order ID to heap index, and a "lazy deletion" pattern (mark cancelled orders and skip them on pop).
In a max-heap stored as an array, node at index $i = 5$ has children at indices left and right. What are they?
The formulas are: $\text{left}(i) = 2i + 1$ and $\text{right}(i) = 2i + 2$.
For $i = 5$: $\text{left}(5) = 2 \times 5 + 1 = 11$, $\text{right}(5) = 2 \times 5 + 2 = 12$.
The parent of node 5 is $\lfloor (5-1)/2 \rfloor = 2$. These index formulas encode the complete binary tree structure without any pointers — the entire tree topology is implicit in the array layout.
Array $A = [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]$ is a valid max-heap. After calling delete_max(), what is the first step of the algorithm?
The delete_max procedure: (1) swap the root $A[0]$ with the last element $A[n-1]$, (2) remove the last element (which now holds the old max), (3) run heapify_down from the new root to restore the heap property.
After step 1–2: array becomes $[1, 14, 10, 8, 7, 9, 3, 2, 4]$ (16 is returned). Now 1 is at the root — it violates the heap property. Heapify_down swaps it with 14 (the larger child), then with 8, eventually producing a valid max-heap.
Option A would cost $O(n)$ for the shift. We specifically avoid this by using the swap-with-last trick, which keeps every operation $O(\log n)$.
Why does build_max_heap (bottom-up heapify_down) run in $O(n)$ rather than $O(n \log n)$, even though it calls heapify_down $n/2$ times?
heapify_down at node $i$ costs $O(\text{height}(i))$, not $O(\log n)$. The heights are heavily skewed: there are $n/2$ leaves at height 0, $n/4$ nodes at height 1, $n/8$ at height 2, etc. The total cost is: $$\sum_{h=0}^{\log n} \frac{n}{2^{h+1}} \cdot h = \frac{n}{2} \sum_{h=0}^{\log n} \frac{h}{2^h} \leq \frac{n}{2} \cdot 2 = O(n)$$ The geometric series $\sum_{h=0}^{\infty} h/2^h = 2$ is the key. Most of the $n/2$ calls to heapify_down are on leaves or near-leaves and cost $O(1)$ or $O(2)$ — not $O(\log n)$. Only the root costs $O(\log n)$, but there is only one root.
Option D is partly right (we do skip true leaves) but doesn't explain why — the real reason is C.
An array is $k$-proximate if every element is at most $k$ positions from its sorted location. What is the optimal time complexity for sorting a $k$-proximate array, and what data structure achieves it?
The $i$-th smallest element must be somewhere in $A[0..i+k]$ (by the $k$-proximate assumption). So we only ever need to consider a window of $k+1$ candidates.
Algorithm: build a min-heap $H$ from $A[0..k-1]$. Then for each output position $j$ from 0 to $n-1$: insert $A[j+k]$ into $H$ (if it exists), then extract-min from $H$ as $B[j]$. Each heap operation on a size-$(k+1)$ heap costs $O(\log k)$. Total: $O(n \log k)$.
When $k = O(\log n)$, this gives $O(n \log \log n)$ — strictly better than $\Theta(n \log n)$. When $k = O(1)$ (nearly sorted), it gives $O(n)$. Insertion sort gives $O(nk)$ which is worse when $k = \omega(\log n)$.
Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 careful argument.
build_max_heap(A) step by step. Show the array state after each call to max_heapify_down and draw the resulting binary tree.
Easy
max_heapify_up showing each swap. Then call delete_max() and trace max_heapify_down. Show the array after each operation.
Easy
heapq). Sort the NIFTY weekly expiry dates $[20240125, 20240201, 20240118, 20240208, 20240111]$ using your implementation. Verify the output.
Medium
heapq is a min-heap. Describe two different ways to use it as a max-heap: (a) negating keys, (b) wrapping items in a custom class with reversed comparison. Which is more robust when items are complex objects (like option records with multiple fields)?
Medium
cancel(order_id) in $O(\log n)$ expected time. Your scheme should use a separate hash map and a validity flag. Describe how delete_max changes, and what the amortised cost is when $f$ of the $n$ items in the heap are cancelled.
Medium
sorted() uses TimSort, C++'s std::sort uses IntroSort. Research (or reason about) two specific reasons why heap sort performs worse in practice despite equivalent asymptotic complexity.
Hard