Complexity
Not every problem has an efficient algorithm. Complexity theory asks: which problems are fundamentally hard? The answer organises problems into classes — P, NP, EXP — and reductions let us prove one problem is at least as hard as another. This chapter is the boundary between what we know how to solve and what we believe cannot be solved efficiently.
Decision Problems — Reducing Everything to Yes/No
Complexity theory works cleanest with decision problems: functions that map every input to either YES or NO. Most algorithmic problems have a natural decision version:
| Problem | Decision version |
|---|---|
| Shortest path | Does $G$ contain a path from $s$ to $t$ with weight $\leq d$? |
| Longest simple path | Does $G$ contain a simple path with weight $\geq d$? |
| Subset Sum | Does $A$ contain a subset summing to exactly $T$? |
| Tetris | Can you survive a given sequence of pieces on a given board? |
| Halting Problem | Does program $P$ terminate on input $I$? |
Decidability — Some Problems Have No Algorithm
A problem is decidable if some program solves it correctly in finite time on every input. Undecidable problems exist — provably. The argument is a counting argument:
The Complexity Hierarchy: R, EXP, P
Among decidable problems, we classify by how much time the best algorithm requires:
These containments are strict: P $\subsetneq$ EXP $\subsetneq$ R (provable via time hierarchy theorems). Whether P $\subsetneq$ NP or P = NP is the most famous open problem in computer science.
NP — Nondeterministic Polynomial Time
A problem is in NP if there exists a polynomial-time verifier: given an input $I$ and a short certificate (a purported proof that $I$ is a YES instance), the verifier checks in polynomial time whether the certificate is valid.
1. $V$ runs in polynomial time in $|I|$.
2. If $I$ is a YES input: there exists a certificate $c$ (polynomial-length bit string) such that $V(I, c) = $ YES.
3. If $I$ is a NO input: for every certificate $c$, $V(I, c) = $ NO.
The certificate is a witness that $I$ is YES. For NO inputs, no witness exists.
| Problem | Certificate (proof it’s YES) | Verifier |
|---|---|---|
| Shortest path $\leq d$ | A path $P$ from $s$ to $t$ | Add up edge weights; check $\leq d$ |
| Negative cycle | A cycle $C$ | Add up weights; check $< 0$ |
| Subset Sum | A subset $A' \subseteq A$ | Sum elements of $A'$; check $= T$ |
| Longest simple path $\geq d$ | A simple path $P$ | Check simple + add weights; check $\geq d$ |
P $\subseteq$ NP: if a problem can be solved in polynomial time, then the verifier just solves it ignoring the certificate. NP $\subseteq$ EXP: to solve any NP problem, try all $2^{n^{O(1)}}$ possible certificates and run the verifier on each. P = NP? Unknown. Most researchers believe P $\neq$ NP — that finding a solution is genuinely harder than checking one.
Reductions — Relating Problem Difficulty
A reduction from problem $A$ to problem $B$ is a polynomial-time algorithm that converts any instance of $A$ into an equivalent instance of $B$. If $B$ is easy to solve, then $A$ is too (using $B$ as a subroutine). Equivalently: if $A$ is hard, then $B$ is at least as hard.
Interpretation: $B$ is at least as hard as $A$. An oracle for $B$ solves $A$.
Examples we have already seen throughout this course:
| $A$ reduces to $B$ | Conversion |
|---|---|
| Unweighted SP $\leq$ Weighted SP | Give every edge weight 1 |
| Unweighted SP $\leq$ Weighted SP (small weights) | Subdivide each edge into unit-weight edges |
| Longest path $\leq$ Shortest path | Negate all edge weights |
| Sorting $\leq$ Finding median | Repeated median extraction |
NP-Hard and NP-Complete
NP-complete: NP-hard $\cap$ NP. The hardest problems in NP. All NP-complete problems are polynomially equivalent to each other.
The first NP-complete problem was Circuit SAT (Cook-Levin theorem, 1971): every NP problem can be encoded as asking whether a Boolean circuit has a satisfying input. From Circuit SAT, thousands of natural problems have been proved NP-complete by reduction.
A zoo of NP-complete problems:
Graph problems: Longest Simple Path, 3-Colouring (but 2-Colouring is in P!), Largest Clique, Travelling Salesman (decision version: is there a tour of all vertices with total weight $\leq d$?), Hamiltonian Cycle.
Combinatorial problems: Subset Sum, 3-Partition, Rectangle Packing, Knapsack (decision version).
Logic: SAT (is a Boolean formula satisfiable?), 3-SAT (same but each clause has 3 literals).
Puzzles and games: Minesweeper, Sudoku, Super Mario Bros., Pokémon, most video games.
1. Pseudopolynomial algorithms — exploit small integer values (Subset Sum $O(nT)$, works when $T$ is small).
2. Approximation algorithms — find a solution within a factor of optimal in polynomial time.
3. Fixed-parameter tractable (FPT) — polynomial if some parameter (e.g., solution size $k$) is small.
4. Heuristics — no worst-case guarantee but works well in practice (most industrial solvers).
5. Special-case structure — e.g., TSP on planar graphs has a PTAS; 2-SAT is in P while 3-SAT is NP-complete.
What This Means in Practice
When you encounter a new problem, the first question is: is it in P or NP-hard? If you can reduce a known NP-complete problem to your problem, your problem is NP-hard and you should use one of the practical strategies above. If you can design a polynomial-time algorithm, you have solved it efficiently.
A fund manager wants to select from $n$ assets, each with expected return $r_i$ and risk $\sigma_i$, to maximise return while keeping total risk $\leq R$. Is this NP-hard?
Set all $\sigma_i = c_i$ (the cost in a Subset Sum instance), set $R = T$ (the target), set all $r_i = 1$ (maximise number of assets selected). Then selecting a subset with total risk exactly $R$ is equivalent to Subset Sum with target $T$. Subset Sum reduces to this portfolio problem, so portfolio optimisation is NP-hard in general.
In quantitative finance, the dominant model is Markowitz mean-variance optimisation — a quadratic program (continuous convex optimisation, not combinatorial). This is in P: it runs in polynomial time via interior-point methods. The NP-hardness applies to the combinatorial version (0/1 asset selection). For continuous portfolio weights, the problem becomes tractable. This is why Markowitz uses continuous weights, not binary “include or exclude” decisions.
The Full Picture — Everything We’ve Covered
Looking back across all 13 chapters, the course has built up the entire toolkit for algorithm design, from the ground up:
| Unit | Key ideas | Trading applications |
|---|---|---|
| Ch1–5: Data Structures | Arrays, trees, heaps, hash tables, sorting | Instrument lookup, order book, tick data sorting |
| Ch6–8: Trees & Heaps | BST, AVL, augmentation, priority queues | Strike range queries, order priority scheduling |
| Ch9–10: Graph Algorithms | BFS, DFS, Dijkstra, Bellman-Ford | Dependency resolution, arbitrage detection |
| Ch11–12: Dynamic Programming | SRT BOT, LCS, LIS, Knapsack | Strike selection, portfolio allocation, sequence matching |
| Ch13: Complexity | P, NP, reductions, NP-hardness | Why discrete portfolio selection is hard; Markowitz as continuous relaxation |
Which of the following best describes what it means for a problem to be in NP?
NP is defined by the existence of a polynomial-time verifier: given an input $I$ and a certificate $c$, we can check in poly($|I|$) time whether $c$ is a valid proof that $I$ is YES. This is the most useful characterisation for algorithm designers.
Option A is technically correct (NP = nondeterministic polynomial time) but obscures the practical meaning. Options B and D are wrong: P $\subseteq$ NP, so every problem in P is also in NP. NP does not mean “hard” — it means “easy to verify.” Whether NP problems are hard to solve is precisely the P vs NP question.
If problem $A$ reduces to problem $B$ in polynomial time ($A \leq_P B$) and $B \in P$, what can we conclude about $A$?
If $A \leq_P B$, then to solve any instance of $A$: (1) convert it to an instance of $B$ in poly time; (2) solve $B$ in poly time (since $B \in P$). Total: poly + poly = poly time. So $A \in P$.
Direction matters critically: $A \leq_P B$ means “$B$ can solve $A$,” so $B$ is at least as hard as $A$. If $B$ is easy ($B \in P$), then $A$ is easy too. If $A$ is hard (NP-hard), then $B$ is hard too. The contrapositive of our conclusion: if $A \notin P$, then $B \notin P$.
Subset Sum is in NP and is NP-complete. We showed a pseudopolynomial $O(nT)$ algorithm for it in Ch12. Does this contradict NP-completeness (i.e., does it prove P = NP)?
The $O(nT)$ algorithm is only polynomial if $T$ is polynomially bounded in $n$ (i.e., $T = O(n^c)$). In general, $T$ can be an integer up to $2^n$ — in which case $O(nT) = O(n \cdot 2^n)$, which is exponential.
“Polynomial time” in complexity theory means polynomial in the bit length of the input. The input size is $O(n + \log T)$ bits. An algorithm is polynomial if it runs in $\text{poly}(n + \log T)$. The $O(nT)$ DP runs in time polynomial in $T$ but $T = 2^{\log T}$, so it’s exponential in $\log T$. This is why Subset Sum is “weakly NP-complete” — it has a pseudopolynomial algorithm. “Strongly NP-complete” problems (like 3-Partition) have no pseudopolynomial algorithm unless P = NP.
You need to solve Travelling Salesman (NP-hard) for $n = 20$ cities in a trading route optimisation. What is the most practical approach?
For small $n$, exact exponential algorithms are feasible. The Held-Karp DP solves TSP in $O(n^2 2^n)$ time. For $n = 20$: $400 \times 2^{20} \approx 400 \times 10^6$ operations — about 0.4 seconds on a modern machine. For $n = 25$: $\approx 800 \times 10^6$. For $n = 30$: $\approx 900 \times 10^9$ — impractical.
For larger $n$: (1) nearest-neighbour heuristic, (2) 2-opt/3-opt local search, (3) Christofides algorithm (1.5-approximation for metric TSP), (4) commercial solvers like Concorde for exact solutions up to $n \sim 10{,}000$. NP-hard means “no polynomial-time algorithm known” — not “impossible in practice for real problem sizes.”
Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 careful argument.