Foundational Chapter 03 of 13 · Algorithms and Data Structures

Sorting I

Sorting is the gateway problem of algorithms. Simple to state, surprisingly subtle to analyse, and the foundation for every efficient data structure that follows. We build from brute force to $\Theta(n^2)$ to $\Theta(n \log n)$ — and prove each step along the way.

Why Sorting? The Set Interface Connection

In Chapter 2 we built Sequence data structures — arrays, linked lists, dynamic arrays. They all store items in an order you impose. But a different interface matters more for search: the Set interface, where items have intrinsic keys and you query by key value rather than position.

Set Interface. A set maintains items with unique keys and supports:

find(k) — return the item with key $k$ · insert(x) — add item $x$ · delete(k) — remove item with key $k$
find_min() · find_max() · find_next(k) · find_prev(k) — order queries

A plain unsorted array can implement all of these — just scan for $k$. But every operation costs $\Theta(n)$. A sorted array changes everything: find(k) drops to $\Theta(\log n)$ via binary search, find_min and find_max become $\Theta(1)$ (first and last elements). The price: building the sorted array costs $\Theta(n \log n)$. That is the preprocessing-vs-query tradeoff, and it is worth it.

So sorting is not an end in itself — it is the prerequisite for efficient Set operations. Every chapter from here through Chapter 8 is about building faster and faster Set implementations, all starting from the ability to sort.

NSE's instrument master file lists ~9,000 tradeable instruments. When your script needs to look up "NIFTY 24000 CE" by symbol string, a sorted array with binary search finds it in $\lceil \log_2 9000 \rceil = 13$ comparisons. An unsorted scan takes up to 9,000. At 10 lookups per tick and 100,000 ticks per session, that is the difference between 130,000 comparisons and 900,000,000. Sorting the instrument master once at session start is the correct preprocessing decision.

The Sorting Problem — Precisely Stated

Sorting Problem. Given a static array $A$ of $n$ comparable items, produce a sorted permutation $B$:

— Permutation: $B$ contains the same elements as $A$, possibly in a different order.
— Sorted: $B[i-1] \leq B[i]$ for all $i \in \{1, \ldots, n-1\}$.

A sort is destructive if it overwrites $A$ in place. A sort is in-place if it uses $O(1)$ extra space beyond the input (in-place implies destructive). A sort is stable if equal elements appear in the output in the same relative order as in the input.

Before analysing good algorithms, let us see the worst possible one — just to appreciate the gap.

Permutation sort: there are $n!$ permutations of $A$; at least one is sorted. Try all of them. Correct by exhaustive case analysis. Runtime: $\Omega(n! \cdot n)$ — checking each permutation takes $\Theta(n)$. For $n = 20$, $20! \approx 2.4 \times 10^{18}$. This is not slow — it is cosmically unusable.

Algorithm 1 — Selection Sort

The first sensible idea: at each step, find the largest unsorted element and put it in its final position. After $i$ steps, the $i$ largest elements are correctly placed at the end of the array.

Invariant: after iteration $i$, sub-array $A[n-i:]$ contains the $i$ largest elements in sorted order, and $A[:n-i]$ contains the remaining elements in some order.

selection_sort · Θ(n²)

def selection_sort(A):
    """Sort array A in-place. Θ(n²) comparisons, O(n) swaps."""
    n = len(A)
    for i in range(n - 1, 0, -1):      # i goes n-1, n-2, ..., 1
        m = i                           # assume A[i] is max of A[:i+1]
        for j in range(i):              # scan A[:i] for a larger element
            if A[j] > A[m]:
                m = j                   # found new max
        A[m], A[i] = A[i], A[m]        # swap max into final position


# Example: sort NIFTY weekly expiry strikes
strikes = [24500, 24000, 23500, 25000, 23000]
selection_sort(strikes)
print(strikes)  # [23000, 23500, 24000, 24500, 25000]

#include <vector>
#include <algorithm>
using namespace std;

// Selection sort — Θ(n²) comparisons, O(n) swaps, in-place
void selection_sort(vector<int>& A) {
    int n = A.size();
    for (int i = n - 1; i > 0; --i) {
        int m = i;                      // index of current max
        for (int j = 0; j < i; ++j)
            if (A[j] > A[m]) m = j;    // find max in A[0..i]
        swap(A[m], A[i]);               // place max at A[i]
    }
}

// Example
// vector<int> strikes = {24500, 24000, 23500, 25000, 23000};
// selection_sort(strikes);
// → {23000, 23500, 24000, 24500, 25000}

↗ Run C++ on Compiler Explorer (Godbolt)

Complexity analysis via recurrence. Let $T(n)$ be the time to sort $n$ elements. Finding the max in a prefix of length $i$ takes $\Theta(i)$. The recurrence is:

Selection Sort Recurrence

$T(1) = \Theta(1), \quad T(n) = T(n-1) + \Theta(n)$

$T(n) = \Theta(n^2)$ — chain of $n$ nodes with $\Theta(i)$ work at level $i$, total $\sum_{i=1}^{n} i = \frac{n(n+1)}{2}$

Selection sort is in-place ($O(1)$ extra space) and uses at most $O(n)$ swaps — the minimum possible for a comparison sort. But it is not stable: swapping can move equal elements out of their original relative order.

Algorithm 2 — Insertion Sort

Different invariant: at each step, the first $i$ elements are sorted among themselves. Insert the next element into its correct position in the sorted prefix by sliding it left until it finds its place.

Invariant: after iteration $i$, sub-array $A[:i]$ is sorted. The remaining elements are untouched.

insertion_sort · Θ(n²) worst, Θ(n) best

def insertion_sort(A):
    """Sort array A in-place. Stable. Θ(n²) worst case, Θ(n) if nearly sorted."""
    for i in range(1, len(A)):
        j = i
        # slide A[j] left until it finds its sorted position
        while j > 0 and A[j] < A[j - 1]:
            A[j - 1], A[j] = A[j], A[j - 1]
            j -= 1


# Key insight: insertion sort is fast on nearly-sorted data.
# If A is already sorted, the while loop never executes — Θ(n) total.
# Real-world use: TimSort (Python's built-in sort) uses insertion sort
# for small subarrays (n < 64) because cache locality dominates there.

#include <vector>
using namespace std;

// Insertion sort — stable, in-place, Θ(n²) worst, Θ(n) if nearly sorted
void insertion_sort(vector<int>& A) {
    int n = A.size();
    for (int i = 1; i < n; ++i) {
        int key = A[i];
        int j   = i - 1;
        // shift elements greater than key one position right
        while (j >= 0 && A[j] > key) {
            A[j + 1] = A[j];
            --j;
        }
        A[j + 1] = key;     // insert key in its correct position
    }
}
// Note: C++ version uses shifts (not swaps) — same O complexity,
// but ~2x fewer memory writes in practice on sorted-ish data.

↗ Run C++ on Compiler Explorer (Godbolt)

Insertion Sort Recurrence

$T(1) = \Theta(1), \quad T(n) = T(n-1) + \Theta(n)$

$T(n) = \Theta(n^2)$ worst case — same recurrence as selection sort

Insertion sort and selection sort have the same worst-case recurrence and both are $\Theta(n^2)$. But insertion sort has a crucial advantage: it is $\Theta(n)$ on nearly-sorted input. If every element is at most $k$ positions from its sorted location, insertion sort runs in $\Theta(kn)$. This matters in practice — live trading data tends to arrive roughly in order (new ticks near the current price), making insertion sort competitive on small, nearly-sorted arrays. Python's built-in sorted() (TimSort) exploits this exactly.

Algorithm 3 — Merge Sort

Both $\Theta(n^2)$ algorithms share the same structure: fix one element per pass, paying $\Theta(n)$ per pass, for $n$ passes total. To do better, we need to fix more structure per pass. Merge sort achieves this by divide and conquer.

Idea: recursively sort the left and right halves, then merge the two sorted halves into one sorted array. The merge step scans both halves simultaneously — two pointers walking from back to front — and takes $\Theta(n)$.

merge_sort · Θ(n log n)

def merge_sort(A, a=0, b=None):
    """Sort A[a:b] in-place. Θ(n log n) time, Θ(n) extra space."""
    if b is None: b = len(A)
    if b - a <= 1: return           # base case: 0 or 1 element

    c = (a + b + 1) // 2           # midpoint (ceiling)
    merge_sort(A, a, c)             # sort left half  T(n/2)
    merge_sort(A, c, b)             # sort right half T(n/2)

    # merge: copy both halves, then merge back into A[a:b]
    L, R = A[a:c], A[c:b]          # O(n) copy
    i = j = 0
    for k in range(a, b):          # O(n) merge
        if j >= len(R) or (i < len(L) and L[i] <= R[j]):
            A[k] = L[i]; i += 1
        else:
            A[k] = R[j]; j += 1
    # Note: L[i] <= R[j] (not <) makes the sort stable

#include <vector>
using namespace std;

void merge(vector<int>& A, int a, int c, int b) {
    vector<int> L(A.begin()+a, A.begin()+c);   // copy left half
    vector<int> R(A.begin()+c, A.begin()+b);   // copy right half
    int i = 0, j = 0;
    for (int k = a; k < b; ++k) {
        if (j >= (int)R.size() || (i < (int)L.size() && L[i] <= R[j]))
            A[k] = L[i++];
        else
            A[k] = R[j++];
    }
}

void merge_sort(vector<int>& A, int a, int b) {
    if (b - a <= 1) return;         // base case
    int c = (a + b + 1) / 2;
    merge_sort(A, a, c);             // T(n/2)
    merge_sort(A, c, b);             // T(n/2)
    merge(A, a, c, b);               // Θ(n)
}

// Entry point
void merge_sort(vector<int>& A) {
    merge_sort(A, 0, A.size());
}

↗ Run C++ on Compiler Explorer (Godbolt)

Solving the Merge Sort Recurrence

The recurrence is $T(1) = \Theta(1)$, $T(n) = 2T(n/2) + \Theta(n)$. We solve it two ways.

Method 1 — Substitution. Guess $T(n) = \Theta(n \log n)$. Substitute $T(n) = cn \log n$:

cn \log n \stackrel{?}{=} \Theta(n) + 2 \cdot c\frac{n}{2}\log\frac{n}{2} = \Theta(n) + cn\log\frac{n}{2} = \Theta(n) + cn(\log n - 1)$$ $$cn \log n = \Theta(n) + cn \log n - cn \implies cn = \Theta(n) \checkmark

Method 2 — Recurrence Tree. Draw the recursion tree. At depth $i$, there are $2^i$ subproblems each of size $n/2^i$. Work at depth $i$: $2^i \cdot \Theta(n/2^i) = \Theta(n)$. The tree has depth $\log_2 n$ (base case at $n/2^i = 1$). Total:

T(n) = \sum_{i=0}^{\log_2 n} \Theta(n) = \Theta(n) \cdot (\log_2 n + 1) = \Theta(n \log n)

The recurrence tree argument is the most useful one to internalise. At every level of the tree, the total work is $\Theta(n)$ — the merging cost is evenly distributed. With $\Theta(\log n)$ levels, total work is $\Theta(n \log n)$. This "equal work per level" structure is the signature of divide-and-conquer algorithms that achieve $\Theta(n \log n)$.

The Master Theorem — Solving Recurrences in General

Many divide-and-conquer algorithms have recurrences of the form $T(n) = aT(n/b) + f(n)$. The Master Theorem gives the answer by comparing $f(n)$ to $n^{\log_b a}$ — the number of leaves in the recursion tree.

Master Theorem (polynomial case). For $T(n) = aT(n/b) + \Theta(n^c)$ with $a \geq 1$, $b > 1$, $c \geq 0$:

Case 1 — $c < \log_b a$: leaves dominate. $T(n) = \Theta(n^{\log_b a})$
Case 2 — $c = \log_b a$: work balanced across levels. $T(n) = \Theta(n^c \log n)$
Case 3 — $c > \log_b a$: root dominates. $T(n) = \Theta(n^c)$

For merge sort: $a = 2$, $b = 2$, $f(n) = \Theta(n) = \Theta(n^1)$. So $c = 1$ and $\log_b a = \log_2 2 = 1$. Case 2 applies: $T(n) = \Theta(n^1 \log n) = \Theta(n \log n)$. ✓

The Master Theorem only applies when $f(n)$ is a polynomial $\Theta(n^c)$. It fails for $f(n) = \Theta(n \log n)$ (merge sort with a log factor in the merge step), or exponential $f(n)$. In those cases, use the recurrence tree directly. Most interview problems fall cleanly into one of the three cases — always state which case and why it applies.

Comparison — All Three Algorithms

Algorithm	Worst Case	Best Case	Comparisons	Swaps	In-place	Stable
Selection Sort	$\Theta(n^2)$	$\Theta(n^2)$	$\Theta(n^2)$	$O(n)$	Yes	No
Insertion Sort	$\Theta(n^2)$	$\Theta(n)$	$\Theta(n^2)$	$\Theta(n^2)$	Yes	Yes
Merge Sort	$\Theta(n \log n)$	$\Theta(n \log n)$	$\Theta(n \log n)$	$\Theta(n \log n)$	No	Yes*

* Merge sort is stable only if the merge step uses $\leq$ (not $<$) when choosing from the left half.

Worked Example · Sorting an Option Chain 🇮🇳 NIFTY Options

At 9:15 AM, the XTS API returns NIFTY option chain data as an unordered list of 450 strike-price records. Your strategy needs to: (a) quickly find the ATM strike, (b) iterate over strikes in order from lowest to highest, (c) binary search for any specific strike. Which algorithm should you use to sort the chain, and what is the total cost?

Choose the algorithm

Merge sort. $n = 450$. The chain is unordered at arrival — insertion sort's "nearly sorted" advantage does not apply. We need a guaranteed $\Theta(n \log n)$ with no worst-case surprises. For $n = 450$: $450 \log_2 450 \approx 450 \times 8.8 \approx 3{,}970$ operations. Selection or insertion sort would take $450^2 / 2 \approx 101{,}250$ comparisons — 25× more work for the same result.

In practice — Python's built-in

Python's sorted(chain, key=lambda x: x['strike']) uses TimSort — a hybrid of merge sort and insertion sort. It is $\Theta(n \log n)$ worst case, $\Theta(n)$ best case, stable, and written in C. Never write your own sort when the built-in is available. Write your own only to understand the algorithm — which is exactly what we just did.

Answer: use Python's sorted() or list.sort() — both are TimSort, $\Theta(n \log n)$, stable. The one-time sort at session open costs ~4,000 operations on 450 strikes. After that, binary search finds any strike in $\lceil \log_2 450 \rceil = 9$ comparisons. Compared to linear scan (225 comparisons on average), binary search is 25× faster per lookup. With 50,000 lookups per session, that is 12.5M comparisons saved — from one sort call.

Python's list.sort() and sorted() are both TimSort — developed by Tim Peters specifically for Python in 2002. TimSort finds already-sorted "runs" in the input and merges them using a merge sort strategy, but uses insertion sort for runs shorter than 64 elements (where cache effects dominate). It is now used in Java's Arrays.sort() for objects, Android's sort, and many other runtimes. When you call sorted(chain, key=lambda x: x['strike']) on your option chain, that is TimSort running under the hood in C at roughly $10^8$ operations per second.

Practice Problems

4 questions · Chapter 03

prog / dsa / ch03 / q01 ★ Conceptual

An options strategy script receives 500 price updates throughout the day. Each update arrives in roughly sorted order — the new price is within 3 positions of where it belongs in the sorted list. Which sort algorithm is most efficient for re-sorting the list after each update?

Merge sort — always $\Theta(n \log n)$ regardless of input order

Insertion sort — $\Theta(kn)$ when each element is at most $k$ positions out of place

Selection sort — fewest swaps of the three algorithms

All three are equivalent — $O(n^2)$ dominates for $n = 500$

Answer: B.

When every element is at most $k$ positions from its sorted location, insertion sort's inner while loop runs at most $k$ times per element. Total cost: $\Theta(kn)$. For $k = 3$ and $n = 500$: roughly 1,500 operations per re-sort.

Merge sort always costs $\Theta(n \log n) \approx 4{,}500$ operations regardless of input order — it ignores existing structure. Selection sort always costs $\Theta(n^2) \approx 125{,}000$. For nearly-sorted data, insertion sort wins decisively. This is precisely why TimSort uses insertion sort for small, nearly-sorted subarrays.

prog / dsa / ch03 / q02 ★★ Master Theorem

A divide-and-conquer algorithm splits a problem of size $n$ into 4 subproblems of size $n/2$, with $\Theta(n)$ work to combine them. What is the asymptotic runtime?

$\Theta(n \log n)$ — same as merge sort

$\Theta(n \log^2 n)$ — extra log factor from the branching

$\Theta(n^2)$ — leaves dominate

$\Theta(n^{1.5})$ — geometric mean of leaves and root work

Answer: C — $\Theta(n^2)$.

Apply Master Theorem: $a = 4$, $b = 2$, $f(n) = \Theta(n^1)$. So $\log_b a = \log_2 4 = 2$. Since $c = 1 < \log_b a = 2$, we are in Case 1 — leaves dominate: $$T(n) = \Theta(n^{\log_2 4}) = \Theta(n^2)$$ Recurrence tree intuition: at depth $i$, there are $4^i$ subproblems of size $n/2^i$. Work at depth $i$: $4^i \cdot \Theta(n/2^i) = \Theta(n \cdot 2^i)$. This grows geometrically — the deepest level (with $4^{\log_2 n} = n^2$ leaves) dominates. Doubling the branches changes everything compared to merge sort's 2 branches.

prog / dsa / ch03 / q03 ★★ Stability

A list of NIFTY option orders is sorted by strike price, then the result is sorted again by order type (CE before PE). Which property must the second sort have to guarantee that orders with the same type remain sorted by strike within each type?

In-place — no extra memory means no reordering of equal elements

Stable — equal elements (same order type) preserve their relative order from the first sort

$\Theta(n \log n)$ — faster sorts are inherently more careful with equal elements

Recursive — only recursive sorts can maintain sub-order

Answer: B — Stable.

A stable sort guarantees that elements with equal keys appear in the output in the same relative order as in the input. After sorting by strike, all CE orders with strike 24000 appear before 24500, etc. If the second sort (by order type) is stable, then within each type, the strike ordering is preserved — because the sorter never swaps two elements with the same key (order type).

This "sort twice" technique is called a radix-style multi-key sort: sort by the least significant key first, then by the most significant key using a stable sort. It is used everywhere in data pipelines — sorting trades by time within each symbol, sorting options by expiry within each strike, etc. Python's sorted() and list.sort() are both stable (TimSort), so this pattern is safe to use without thinking about it.

prog / dsa / ch03 / q04 ★★★ Recurrence Tree

Merge sort on an array of $n = 8$ elements. The merge step at the top level merges two sorted halves of size 4. How many total element comparisons does merge sort perform across all merge operations?

8 comparisons — one per element

24 comparisons — $n \log_2 n = 8 \times 3$

Between 12 and 24 comparisons — merge cost varies with input

28 comparisons — $n(n-1)/2$ like insertion sort

Answer: C.

Merging two sorted arrays of sizes $p$ and $q$ requires between $\max(p,q)$ and $p+q-1$ comparisons. The minimum occurs when all of one array is smaller than all of the other (no interleaving). The maximum occurs when elements alternate perfectly.

For $n = 8$ ($\log_2 8 = 3$ levels):
— Level 3 (base merges of size 1+1): 4 merges × 1 comparison each = 4
— Level 2 (merges of size 2+2): 2 merges × (2 to 3) comparisons each = 4 to 6
— Level 1 (top merge of 4+4): 1 merge × (4 to 7) comparisons = 4 to 7
Total range: $4 + 4 + 4 = 12$ to $4 + 6 + 7 = 17$ (not quite 24 — the $n \log n$ bound includes the copying overhead, not just comparisons).

The $\Theta(n \log n)$ bound counts all operations (comparisons + copies), not just comparisons. Comparisons alone are between $\frac{n}{2}\log n$ and $n \log n$.

Terminal Questions — Chapter 03 10 problems · No answers given

Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 require careful argument.

Trace selection sort on $A = [5, 2, 8, 1, 9, 3]$. Show the array state after each outer loop iteration and count the total number of swaps performed. Easy

Trace insertion sort on the same array $A = [5, 2, 8, 1, 9, 3]$. Show the array state after each outer loop iteration. How many swaps does insertion sort perform — more or fewer than selection sort on this input? Easy

Apply the Master Theorem to each recurrence: (a) $T(n) = 3T(n/3) + \Theta(n)$; (b) $T(n) = T(n/2) + \Theta(1)$; (c) $T(n) = 9T(n/3) + \Theta(n)$; (d) $T(n) = 2T(n/4) + \Theta(\sqrt{n})$. State which case applies and why. Easy

Prove that selection sort is correct using a loop invariant. State the invariant precisely, then prove initialisation, maintenance, and termination. Be specific about what "after iteration $i$" means. Medium

Merge sort uses $\Theta(n)$ extra space for the merge step. Design a version that avoids extra memory by merging in-place. What is the time complexity of your in-place merge? What does this do to the total runtime of merge sort? Medium

A list of NSE BhavCopy records (end-of-day prices for ~2,000 symbols) needs to be sorted by symbol name for binary search. You also need to preserve the original arrival order within each symbol (some symbols appear multiple times with different series). Which algorithm do you choose and why? What is the expected runtime? Medium

Binary search on a sorted array finds a key in $\Theta(\log n)$ time. Write the recurrence $T(n) = T(n/2) + \Theta(1)$, solve it using the Master Theorem, and verify with a recurrence tree. Then implement binary search in Python and confirm it returns the correct index for a NIFTY strike price lookup. Medium

Prove the following lower bound: any comparison-based sorting algorithm requires $\Omega(n \log n)$ comparisons in the worst case. Hint: model the algorithm as a decision tree where each internal node is a comparison and each leaf is a permutation. Use the fact that a binary tree of height $h$ has at most $2^h$ leaves. Hard

Insertion sort runs in $\Theta(n + I)$ time where $I$ is the number of inversions in the input — pairs $(i, j)$ with $i < j$ but $A[i] > A[j]$. Prove this claim. What is the maximum number of inversions in an array of $n$ elements? What does this imply about insertion sort's worst case? Hard

Design a sorting algorithm for the following constrained problem: you have a stream of NIFTY tick prices arriving one at a time, and you must maintain a sorted list at all times (i.e., insert each new price into its correct position as it arrives). What is the per-insertion cost? The total cost for $n$ prices? Compare to sorting all $n$ prices at the end with merge sort — when is the streaming approach better? Hard

← Sequences Next: Hashing →