Intermediate Chapter 05 of 13 · Algorithms and Data Structures

Sorting II

We proved merge sort is optimal in the comparison model. This chapter breaks through the $\Theta(n \log n)$ floor — not with cleverer comparisons, but by abandoning them altogether, exactly as we did for search in Chapter 4.

The Sorting Lower Bound — $\Theta(n \log n)$ Is the Wall

In Chapter 3 we proved the comparison search lower bound: any comparison algorithm needs $\Omega(\log n)$ time. The same decision-tree argument gives a much stronger bound for sorting.

A sorting algorithm must produce the correct output permutation for any input. There are $n!$ possible permutations of $n$ distinct items — so the decision tree needs at least $n!$ leaves. A binary tree with $\geq n!$ leaves has height at least $\log_2(n!)$. By Stirling's approximation $n! > (n/2)^{n/2}$, so:

\log_2(n!) \geq \log_2\!\left(\frac{n}{2}\right)^{n/2} = \frac{n}{2}\log_2\frac{n}{2} = \Omega(n \log n)

Therefore every comparison sort requires $\Omega(n \log n)$ comparisons in the worst case. Merge sort achieves $\Theta(n \log n)$ — it is optimal in the comparison model. You cannot do better with comparisons alone.

But the lower bound only applies to algorithms that learn about the input exclusively through pairwise comparisons. The moment we exploit additional structure — specifically that keys are integers with bounded range — the bound no longer applies. This is the same escape hatch we used in Chapter 4: random access indexing has super-constant branching factor, which breaks the $\Omega(\log n)$ search bound. The same idea breaks the $\Omega(n \log n)$ sort bound.

Direct Access Array Sort — $\Theta(n)$ When Keys Are Small

The simplest non-comparison sort: if all keys are distinct non-negative integers in range $\{0, \ldots, u-1\}$, use a direct access array. Insert each item at its key index in $\Theta(n)$, then read back in order in $\Theta(u)$. Total: $\Theta(n + u) = \Theta(n)$ when $u = \Theta(n)$.

direct_access_sort · Θ(n+u), distinct keys only

def direct_access_sort(A):
    """Sort A assuming items have distinct non-negative integer keys."""
    u = 1 + max(x.key for x in A)   # O(n) — find key range
    D = [None] * u                   # O(u) — direct access array
    for x in A:                      # O(n) — place each item
        D[x.key] = x
    i = 0
    for key in range(u):             # O(u) — read back in order
        if D[key] is not None:
            A[i] = D[key]
            i += 1
    # Total: Θ(n + u). Linear when u = Θ(n).
    # Fatal flaw: cannot handle duplicate keys (second item overwrites first).

#include <vector>
#include <algorithm>
using namespace std;

struct Item { int key; /* other fields */ };

void direct_access_sort(vector<Item>& A) {
    int u = 1 + (*max_element(A.begin(), A.end(),
                  [](auto& a, auto& b){ return a.key < b.key; })).key;
    vector<Item*> D(u, nullptr);    // O(u) — indexed by key
    for (auto& x : A)               // O(n) — place items
        D[x.key] = &x;
    int i = 0;
    vector<Item> out;
    for (int k = 0; k < u; ++k)    // O(u) — read in order
        if (D[k]) out.push_back(*D[k]);
    A = out;
    // O(n + u). Requires distinct keys — duplicate key overwrites earlier item.
}

↗ Run C++ on Compiler Explorer (Godbolt)

Two problems remain: duplicate keys (overwriting), and large key ranges ($u \gg n$). We fix them in order.

Counting Sort — Handling Duplicates Stably

Replace each single slot with a chain — a list that accumulates all items sharing that key. Insert each item to the end of its chain (preserving arrival order), then read back chain by chain. This makes counting sort stable: items with equal keys appear in the output in the same order as the input.

counting_sort · Θ(n+u), stable, handles duplicates

def counting_sort(A):
    """Stable sort of A assuming non-negative integer keys."""
    u = 1 + max(x.key for x in A)   # O(n)
    D = [[] for _ in range(u)]       # O(u) — array of chains

    for x in A:                      # O(n) — append to chain
        D[x.key].append(x)           # insert_last preserves order → stable

    i = 0
    for chain in D:                  # O(u) — read chains in key order
        for x in chain:              # O(n) total across all chains
            A[i] = x
            i += 1
    # Total: Θ(n + u).
    # Stability proof: within each chain, items appear in insertion order
    # (= original array order), so equal-key items preserve relative order.

#include <vector>
#include <list>
using namespace std;

struct Item { int key; int val; };

void counting_sort(vector<Item>& A) {
    if (A.empty()) return;
    int u = 1 + (*max_element(A.begin(), A.end(),
                  [](auto& a, auto& b){ return a.key < b.key; })).key;

    vector<list<Item>> D(u);        // O(u) — array of chains

    for (auto& x : A)               // O(n) — append to correct chain
        D[x.key].push_back(x);       // push_back = insert_last → stable

    int i = 0;
    for (auto& chain : D)           // O(u) scan + O(n) items total
        for (auto& x : chain)
            A[i++] = x;
    // Θ(n + u). Stable by construction.
}

↗ Run C++ on Compiler Explorer (Godbolt)

Let's trace counting sort on a small example. Keys are digit-like integers in $\{0,\ldots,4\}$:

Input

A = [(key=3), (key=1), (key=4), (key=1), (key=2)]

After building chains (D[0]..D[4])

D[0] = []
D[1] = [(key=1), (key=1)] ← both 1s, in original order
D[2] = [(key=2)]
D[3] = [(key=3)]
D[4] = [(key=4)]

Output (read chains left to right)

A = [(key=1), (key=1), (key=2), (key=3), (key=4)] ← stable ✓

Counting sort requires $\Theta(u)$ space — one chain slot per key value. If $u = 10^9$ (e.g. instrument token range), allocating the chain array takes gigabytes. The algorithm is only practical when $u = O(n)$. For larger ranges, radix sort is the answer.

The Key Idea — Represent Large Keys as Tuples

Suppose keys can be up to $u < n^2$. A key $k$ can be written as a two-digit base-$n$ number: $k = a \cdot n + b$ where $a = \lfloor k/n \rfloor$ and $b = k \bmod n$. Both $a$ and $b$ are in range $\{0, \ldots, n-1\}$. So every key becomes a pair $(a, b)$ with values bounded by $n$ — tractable for counting sort.

For example with $n = 5$ (so base $5$): key $17 = 3 \cdot 5 + 2 \Rightarrow (3, 2)$. Key $24 = 4 \cdot 5 + 4 \Rightarrow (4, 4)$. All pairs have digits in $\{0, 1, 2, 3, 4\}$ — counting sort handles each digit in $\Theta(n)$.

The remaining question: how do you sort by two keys simultaneously? The answer is tuple sort.

Tuple Sort — Sorting Least Significant First

Tuple sort. To sort items with multi-key tuples $(k_1, k_2, \ldots, k_c)$ lexicographically (most significant key first):

1. Sort by least significant key $k_c$ using a stable sort.
2. Sort by $k_{c-1}$ using a stable sort.
$\vdots$
c. Sort by most significant key $k_1$ using a stable sort.

Stability is essential: when sorting by $k_1$, items with equal $k_1$ must preserve their $k_2$-order from the previous round.

Why least-significant-first? Consider sorting $[(3,2), (0,3), (4,4), (4,2), (2,2)]$ (our base-5 example):

Step 1 — sort by second digit (least significant)

[(4,2), (2,2), (3,2), (0,3), (4,4)] ← all 2s first, then 3, then 4

Step 2 — sort by first digit (most significant), stably

[(0,3), (2,2), (3,2), (4,2), (4,4)] ← (4,2) before (4,4) preserved ✓

Decode back to integers (×5 + second digit)

[3, 12, 17, 22, 24] ← correctly sorted ✓

Radix Sort — Linear Time for Bounded Integer Keys

Radix sort combines tuple sort with counting sort as the stable subroutine. For keys up to $u$, represent each key as a $c$-digit base-$n$ number where $c = \lceil \log_n u \rceil$. Sort each digit with counting sort ($\Theta(n)$ per digit). Total: $\Theta(cn)$.

If $u \leq n^c$ for some constant $c$, then $\Theta(cn) = \Theta(n)$ — linear time.

radix_sort · Θ(cn) = Θ(n) when u = O(n^c)

def radix_sort(A):
    """Sort A (non-negative integer keys) in Θ(cn) time, c = ceil(log_n(u))."""
    n = len(A)
    if n == 0: return
    u = 1 + max(x.key for x in A)
    # number of base-n digits needed
    c = 1 + (u.bit_length() // n.bit_length())

    # Build digit-tuple objects
    class Wrapper:
        def __init__(self, item):
            self.item = item
            self.digits = []          # digits[0] = least significant
            high = item.key
            for _ in range(c):
                high, low = divmod(high, n)
                self.digits.append(low)
        # counting_sort reads .key
        @property
        def key(self): return self._cur_key

    wrappers = [Wrapper(x) for x in A]

    # Sort c times, from least to most significant digit
    for i in range(c):
        for w in wrappers:
            w._cur_key = w.digits[i]   # set digit i as current key
        counting_sort(wrappers)         # stable Θ(n) sort on this digit

    # Write back
    for i, w in enumerate(wrappers):
        A[i] = w.item

#include <vector>
#include <cmath>
using namespace std;

// Radix sort for non-negative integers — Θ(cn) time
void radix_sort(vector<int>& A) {
    if (A.empty()) return;
    int n = A.size();
    int u = *max_element(A.begin(), A.end()) + 1;
    int c = (int)ceil(log((double)u) / log((double)n)) + 1;

    vector<int> out(n);

    // Sort c times from least to most significant base-n digit
    for (int d = 0, base = 1; d < c; ++d, base *= n) {
        // Counting sort on digit d
        vector<int> count(n, 0);
        for (int x : A) count[(x / base) % n]++;          // count each digit
        for (int k = 1; k < n; ++k) count[k] += count[k-1]; // prefix sums
        // Traverse A in reverse to maintain stability
        for (int i = (int)A.size()-1; i >= 0; --i) {
            int digit = (A[i] / base) % n;
            out[--count[digit]] = A[i];
        }
        A = out;
    }
    // Θ(cn) total. Linear when c is constant.
}

↗ Run C++ on Compiler Explorer (Godbolt)

The Complete Sorting Picture

Algorithm	Worst Case	Best Case	Space	In-place	Stable	Key Constraint
Selection Sort	$\Theta(n^2)$	$\Theta(n^2)$	$O(1)$	Yes	No	Any comparable
Insertion Sort	$\Theta(n^2)$	$\Theta(n)$	$O(1)$	Yes	Yes	Any comparable
Merge Sort	$\Theta(n \log n)$	$\Theta(n \log n)$	$O(n)$	No	Yes	Any comparable
Counting Sort	$\Theta(n + u)$	$\Theta(n + u)$	$O(n + u)$	No	Yes	Integer $\in [0, u)$
Radix Sort	$\Theta(cn)$	$\Theta(cn)$	$O(n)$	No	Yes	Integer $\leq n^c$

Counting and Radix Sort highlighted — they beat the comparison lower bound by exploiting integer key structure.

Worked Example · Sorting Option Chain by Strike 🇮🇳 NIFTY Options

NIFTY weekly options have strikes from 22,000 to 26,000 in steps of 50 — exactly 81 distinct strikes. You have $n = 810$ option records (10 per strike: CE + PE × 5 expiries). Keys are integers in range $[22000, 26000]$. Should you use radix sort or merge sort?

Analyse the key range

$u = 26001$, $n = 810$. The ratio $u/n \approx 32$. Counting sort would allocate 26,001 chain slots to hold 810 items — wasteful but not catastrophic for a single sort at session open.

Shift keys to $[0, 4000]$ (subtract 22,000), so $u = 4001$. Radix sort in base $n = 810$: each key $k \leq 4000 < 810^2 = 656,100$, so $c = 2$ digits suffice. Two passes of counting sort: $2 \times \Theta(810) = \Theta(n)$.

In practice

For $n = 810$, all three algorithms (merge sort $\Theta(n \log n) \approx 8{,}100$, counting sort $\Theta(n + u) \approx 4{,}800$, radix sort $\Theta(2n) \approx 1{,}620$) finish in microseconds — the difference is not perceptible. Use Python's sorted(chain, key=lambda x: x['strike']) (TimSort) and move on.

When radix sort genuinely matters: sorting 1 million tick records by timestamp (64-bit integer). Merge sort: $\Theta(n \log n) \approx 20 \times 10^6$ operations. Radix sort with $c = 8$ base-256 digits: $\Theta(8n) = 8 \times 10^6$ operations — 2.5× faster. At this scale the difference is measurable in production. Python's built-in is still TimSort there; for a million-record tick sort you would call numpy.sort (which uses radix sort internally for integer arrays) or write C++ with the implementation above.

NSE BhavCopy files contain end-of-day data for all traded instruments — timestamps as integers (HHMMSS format, so values up to 235959), prices as integers scaled by 100. Both are bounded integers. When processing a full day's tick data (tens of millions of records) in Python, numpy.sort on integer arrays uses a radix-sort variant and is typically 5–10× faster than comparison-based sort for the same data. The boundary between "use the built-in" and "care about the algorithm" is around $10^6$ records for a modern machine.

Practice Problems

4 questions · Chapter 05

prog / dsa / ch05 / q01 ★ Conceptual

You want to sort 10,000 NSE instrument tokens (32-bit integers, values up to $2^{32} \approx 4 \times 10^9$) using counting sort directly. Why is this impractical?

Counting sort cannot handle 32-bit integers — only 8-bit values

The chain array has $2^{32} \approx 4 \times 10^9$ slots — 32 GB of memory to sort 10,000 items

Counting sort is not stable so it cannot sort integers correctly

Counting sort requires distinct keys — instrument tokens may repeat

Answer: B.

Counting sort allocates one chain slot per possible key value. With $u = 2^{32}$, the chain array has $4 \times 10^9$ entries. At 8 bytes per Python list pointer, that is 32 GB — to sort 10,000 items. The runtime is also $\Theta(n + u) = \Theta(4 \times 10^9)$ — about 4 seconds at $10^9$ ops/sec, compared to merge sort's $\Theta(n \log n) \approx 130{,}000$ operations.

The fix is radix sort: represent each 32-bit key as 4 base-256 digits ($256^4 = 2^{32}$). Four passes of counting sort with $u = 256$: $4 \times \Theta(n + 256) = \Theta(4n) = \Theta(40{,}000)$ operations. Practical.

prog / dsa / ch05 / q02 ★★ Stability

Tuple sort sorts items first by the least significant key, then by more significant keys. Why does tuple sort fail if the subroutine sort is not stable?

An unstable sort may run in $\Omega(n^2)$ time, destroying the linear bound

An unstable sort cannot handle integer keys — only comparison keys

Sorting by a more significant key would scramble the order established by less significant keys, destroying lexicographic correctness

Unstable sorts require more memory, which counting sort cannot provide

Answer: C.

After sorting by the least significant digit, items with equal least-significant digits are in the correct relative order. When we then sort by the next digit, items with equal next-digit values must preserve their previously established order. An unstable sort may arbitrarily reorder items with equal keys — destroying the ordering from the previous round.

Concrete example: after round 1 (sort by units digit), we have $[(4,2), (2,2), (3,2), \ldots]$ — the three items with units digit 2 are in the right relative order. If round 2 (sort by tens digit) is unstable, it might output $[(2,2), (4,2), (3,2)]$ — equal tens-digit items in arbitrary order, breaking the lexicographic sort. Stability is not optional in tuple sort — it is the entire reason the algorithm works.

prog / dsa / ch05 / q03 ★★ Radix Sort

Apply radix sort (base 10) to the sequence $[329, 457, 657, 839, 436, 720, 355]$. After sorting by the units digit (first pass), what is the array?

$[720, 355, 436, 457, 657, 329, 839]$

$[329, 355, 436, 457, 657, 720, 839]$

$[720, 329, 839, 436, 355, 457, 657]$

$[436, 355, 657, 457, 720, 329, 839]$

Answer: A — $[720, 355, 436, 457, 657, 329, 839]$.

Sort by units digit (0–9), stably:
Units digit 0: 720
Units digit 5: 355
Units digit 6: 436
Units digit 7: 457, 657 (in original order — stable)
Units digit 9: 329, 839 (in original order — stable)

Result: $[720, 355, 436, 457, 657, 329, 839]$. After pass 2 (tens digit): $[720, 329, 436, 839, 355, 457, 657]$. After pass 3 (hundreds digit): $[329, 355, 436, 457, 657, 720, 839]$ — fully sorted.

prog / dsa / ch05 / q04 ★★★ Lower Bound

Radix sort runs in $\Theta(cn)$ time where $c = \lceil \log_n u \rceil$. If we increase the base from $n$ to $n^2$, how does the runtime change?

Runtime doubles — more work per digit pass

Fewer digit passes but each pass costs $\Theta(n^2)$ — net runtime unchanged at $\Theta(cn)$

Runtime halves — half as many passes and same cost per pass

Runtime becomes $\Theta(n \log n)$ — base choice has no effect

Answer: B.

With base $b$, radix sort does $c = \lceil \log_b u \rceil$ passes, each costing $\Theta(n + b)$. Total: $\Theta(c(n + b)) = \Theta(\log_b(u) \cdot (n + b))$.

With base $n$: $c = \log_n u$ passes, each $\Theta(n + n) = \Theta(n)$. Total: $\Theta(n \log_n u)$.

With base $n^2$: $c = \log_{n^2} u = \frac{\log_n u}{2}$ passes (half as many), each $\Theta(n + n^2) = \Theta(n^2)$. Total: $\Theta(\frac{\log_n u}{2} \cdot n^2) = \Theta(n^2 \log_n u)$ — actually worse.

The optimal base is $\Theta(n)$ — it minimises total work. Increasing the base reduces pass count but makes each pass more expensive. The $\Theta(cn)$ bound with base $n$ is the sweet spot: linear time when $c = O(1)$.

Terminal Questions — Chapter 05 10 problems · No answers given

Work through independently. Q1–3 are direct applications. Q4–7 require synthesis. Q8–10 demand careful argument.

Trace counting sort on the array $[3, 1, 4, 1, 5, 9, 2, 6, 5, 3]$. Show the chain array $D$ after all insertions. Verify that the output is correct and that the two 1s, two 3s, and two 5s appear in their original relative order. Easy

Complete the radix sort trace from the worked example: given $[329, 457, 657, 839, 436, 720, 355]$, show the array after the tens-digit pass (pass 2) and the hundreds-digit pass (pass 3). Verify the final output is sorted. Easy

$n = 1{,}000$ integers from range $[0, 10^6]$. (a) What is the runtime of counting sort? (b) What is the runtime of radix sort with base $n = 1{,}000$? (c) What is $c$ (number of digit passes)? (d) Which is faster in practice and why? Easy

Describe a linear-time algorithm to sort $n$ integers from range $[-n^2, n^3]$. (Hint: shift all values to make them non-negative, then apply radix sort. What is $u$ after the shift? How many base-$n$ digits are needed?) Medium

Describe a $\Theta(nk)$ algorithm to sort $n$ strings, each of exactly $k$ English lowercase characters. (Hint: tuple sort with counting sort as subroutine. What is the alphabet size? What does each pass cost?) Medium

A list of NIFTY tick records has timestamp field as HHMMSS integer (e.g. 091523 = 09:15:23). Values range from 091500 to 153000 — about 23,000 distinct values. You have $n = 500{,}000$ tick records. (a) Can you sort by timestamp in linear time? (b) Which algorithm do you use and what is the exact complexity? Medium

Prove that counting sort is stable. Specifically: if items $x$ and $y$ have $x.\text{key} = y.\text{key}$ and $x$ appears before $y$ in the input array $A$, prove that $x$ appears before $y$ in the output. Your proof should reference the specific line of code that ensures this. Medium

The sorting lower bound proves $\Omega(n \log n)$ for comparison sorts. Does radix sort contradict this? Carefully identify which assumption of the lower bound proof radix sort violates. Your answer should be more specific than "it doesn't use comparisons." Hard

Implement counting sort using the cumulative sum method (not the chaining method): count occurrences of each key, compute prefix sums to find final positions, then place each item using its prefix-sum index. Show that this implementation is also stable (requires traversing $A$ in reverse). Write the Python code and verify on a small example. Hard

Given $n$ points in the plane with integer coordinates $(x, y)$ where $0 \leq x, y < n$, design a $\Theta(n)$ algorithm to sort them lexicographically (first by $x$, then by $y$ for equal $x$). Prove correctness and analyse the complexity. What is the key range for each digit? Hard

← Hashing Next: Binary Trees →