Complexity Chapter 13 of 13 · Algorithms and Data Structures

Complexity

Not every problem has an efficient algorithm. Complexity theory asks: which problems are fundamentally hard? The answer organises problems into classes — P, NP, EXP — and reductions let us prove one problem is at least as hard as another. This chapter is the boundary between what we know how to solve and what we believe cannot be solved efficiently.

Decision Problems — Reducing Everything to Yes/No

Complexity theory works cleanest with decision problems: functions that map every input to either YES or NO. Most algorithmic problems have a natural decision version:

Problem	Decision version
Shortest path	Does $G$ contain a path from $s$ to $t$ with weight $\leq d$?
Longest simple path	Does $G$ contain a simple path with weight $\geq d$?
Subset Sum	Does $A$ contain a subset summing to exactly $T$?
Tetris	Can you survive a given sequence of pieces on a given board?
Halting Problem	Does program $P$ terminate on input $I$?

Decidability — Some Problems Have No Algorithm

A problem is decidable if some program solves it correctly in finite time on every input. Undecidable problems exist — provably. The argument is a counting argument:

Every program is a finite string of bits — a non-negative integer. So the set of all possible programs has cardinality $|\mathbb{N}|$ (countably infinite). Every decision problem is a function $\mathbb{N} \to \{0,1\}$ — an infinite binary string — so the set of all problems has cardinality $|\mathbb{R}|$ (uncountably infinite, by Cantor’s argument). Since $|\mathbb{N}| \ll |\mathbb{R}|$, most decision problems cannot be solved by any program. One famous undecidable problem: the Halting Problem — no program can correctly decide for all inputs whether a given program halts.

The Complexity Hierarchy: R, EXP, P

Among decidable problems, we classify by how much time the best algorithm requires:

R Decidable in finite time. Contains almost all problems we ever think about.

EXP Decidable in $2^{n^{O(1)}}$ time. Chess is EXP-complete — in EXP but not (believed) in P.

NP Checkable in poly time. Solutions easy to verify; hard (?) to find. P $\subseteq$ NP $\subseteq$ EXP.

P Decidable in $n^{O(1)}$ time. “Efficient” algorithms. The focus of this entire course.

These containments are strict: P $\subsetneq$ EXP $\subsetneq$ R (provable via time hierarchy theorems). Whether P $\subsetneq$ NP or P = NP is the most famous open problem in computer science.

NP — Nondeterministic Polynomial Time

A problem is in NP if there exists a polynomial-time verifier: given an input $I$ and a short certificate (a purported proof that $I$ is a YES instance), the verifier checks in polynomial time whether the certificate is valid.

NP (formal definition). A decision problem is in NP if there exists a verifier $V$ such that:

1. $V$ runs in polynomial time in $|I|$.
2. If $I$ is a YES input: there exists a certificate $c$ (polynomial-length bit string) such that $V(I, c) = $ YES.
3. If $I$ is a NO input: for every certificate $c$, $V(I, c) = $ NO.

The certificate is a witness that $I$ is YES. For NO inputs, no witness exists.

Problem	Certificate (proof it’s YES)	Verifier
Shortest path $\leq d$	A path $P$ from $s$ to $t$	Add up edge weights; check $\leq d$
Negative cycle	A cycle $C$	Add up weights; check $< 0$
Subset Sum	A subset $A' \subseteq A$	Sum elements of $A'$; check $= T$
Longest simple path $\geq d$	A simple path $P$	Check simple + add weights; check $\geq d$

P $\subseteq$ NP: if a problem can be solved in polynomial time, then the verifier just solves it ignoring the certificate. NP $\subseteq$ EXP: to solve any NP problem, try all $2^{n^{O(1)}}$ possible certificates and run the verifier on each. P = NP? Unknown. Most researchers believe P $\neq$ NP — that finding a solution is genuinely harder than checking one.

The P vs NP question in one sentence: is a puzzle that is easy to verify also easy to solve? Sudoku is easy to verify (check each row, column, and box is a permutation of 1–9). Is it easy to solve? Sudoku is NP-complete: believed to require exponential time in the worst case. The $\$1{,}000{,}000$ Millennium Prize awaits a proof either way.

Reductions — Relating Problem Difficulty

A reduction from problem $A$ to problem $B$ is a polynomial-time algorithm that converts any instance of $A$ into an equivalent instance of $B$. If $B$ is easy to solve, then $A$ is too (using $B$ as a subroutine). Equivalently: if $A$ is hard, then $B$ is at least as hard.

$A \leq_P B$ (A is polynomial-time reducible to B): there exists a polynomial-time algorithm $f$ such that $x$ is a YES instance of $A$ if and only if $f(x)$ is a YES instance of $B$.

Interpretation: $B$ is at least as hard as $A$. An oracle for $B$ solves $A$.

Examples we have already seen throughout this course:

$A$ reduces to $B$	Conversion
Unweighted SP $\leq$ Weighted SP	Give every edge weight 1
Unweighted SP $\leq$ Weighted SP (small weights)	Subdivide each edge into unit-weight edges
Longest path $\leq$ Shortest path	Negate all edge weights
Sorting $\leq$ Finding median	Repeated median extraction

NP-Hard and NP-Complete

NP-hard: a problem $H$ is NP-hard if every problem in NP reduces to $H$ in polynomial time. Solving $H$ in polynomial time would imply P = NP.

NP-complete: NP-hard $\cap$ NP. The hardest problems in NP. All NP-complete problems are polynomially equivalent to each other.

The first NP-complete problem was Circuit SAT (Cook-Levin theorem, 1971): every NP problem can be encoded as asking whether a Boolean circuit has a satisfying input. From Circuit SAT, thousands of natural problems have been proved NP-complete by reduction.

A zoo of NP-complete problems:

Graph problems: Longest Simple Path, 3-Colouring (but 2-Colouring is in P!), Largest Clique, Travelling Salesman (decision version: is there a tour of all vertices with total weight $\leq d$?), Hamiltonian Cycle.

Combinatorial problems: Subset Sum, 3-Partition, Rectangle Packing, Knapsack (decision version).

Logic: SAT (is a Boolean formula satisfiable?), 3-SAT (same but each clause has 3 literals).

Puzzles and games: Minesweeper, Sudoku, Super Mario Bros., Pokémon, most video games.

NP-hard does not mean impossible. Several strategies exist for NP-hard problems in practice:

1. Pseudopolynomial algorithms — exploit small integer values (Subset Sum $O(nT)$, works when $T$ is small).
2. Approximation algorithms — find a solution within a factor of optimal in polynomial time.
3. Fixed-parameter tractable (FPT) — polynomial if some parameter (e.g., solution size $k$) is small.
4. Heuristics — no worst-case guarantee but works well in practice (most industrial solvers).
5. Special-case structure — e.g., TSP on planar graphs has a PTAS; 2-SAT is in P while 3-SAT is NP-complete.

What This Means in Practice

When you encounter a new problem, the first question is: is it in P or NP-hard? If you can reduce a known NP-complete problem to your problem, your problem is NP-hard and you should use one of the practical strategies above. If you can design a polynomial-time algorithm, you have solved it efficiently.

Worked Example · Portfolio Optimisation and NP-Hardness 🇮🇳 Quant Finance

A fund manager wants to select from $n$ assets, each with expected return $r_i$ and risk $\sigma_i$, to maximise return while keeping total risk $\leq R$. Is this NP-hard?

Reduction from Subset Sum

Set all $\sigma_i = c_i$ (the cost in a Subset Sum instance), set $R = T$ (the target), set all $r_i = 1$ (maximise number of assets selected). Then selecting a subset with total risk exactly $R$ is equivalent to Subset Sum with target $T$. Subset Sum reduces to this portfolio problem, so portfolio optimisation is NP-hard in general.

What practitioners do

In quantitative finance, the dominant model is Markowitz mean-variance optimisation — a quadratic program (continuous convex optimisation, not combinatorial). This is in P: it runs in polynomial time via interior-point methods. The NP-hardness applies to the combinatorial version (0/1 asset selection). For continuous portfolio weights, the problem becomes tractable. This is why Markowitz uses continuous weights, not binary “include or exclude” decisions.

The key insight: knowing whether a problem is NP-hard determines which tools to reach for. Continuous relaxation (Markowitz) → polynomial-time convex solver. Discrete selection (exact K-asset portfolio) → NP-hard, use pseudopolynomial DP (if $R$ is small) or approximation. The complexity classification is not just theory — it directly drives architectural choices in quantitative systems.

The Full Picture — Everything We’ve Covered

Looking back across all 13 chapters, the course has built up the entire toolkit for algorithm design, from the ground up:

Unit	Key ideas	Trading applications
Ch1–5: Data Structures	Arrays, trees, heaps, hash tables, sorting	Instrument lookup, order book, tick data sorting
Ch6–8: Trees & Heaps	BST, AVL, augmentation, priority queues	Strike range queries, order priority scheduling
Ch9–10: Graph Algorithms	BFS, DFS, Dijkstra, Bellman-Ford	Dependency resolution, arbitrage detection
Ch11–12: Dynamic Programming	SRT BOT, LCS, LIS, Knapsack	Strike selection, portfolio allocation, sequence matching
Ch13: Complexity	P, NP, reductions, NP-hardness	Why discrete portfolio selection is hard; Markowitz as continuous relaxation

The $\text{P} \neq \text{NP}$ conjecture has a direct implication for cryptography — and therefore for every financial system that uses HTTPS. RSA encryption is based on the hardness of integer factorisation, which is believed to be NP-hard (though not known to be NP-complete). If $\text{P} = \text{NP}$, factorisation becomes polynomial-time solvable, RSA breaks, and every encrypted financial transaction becomes insecure overnight. The one-million-dollar question is not merely academic.

Practice Problems

4 questions · Chapter 13

prog / dsa / ch13 / q01 ★ NP Definition

Which of the following best describes what it means for a problem to be in NP?

The problem can be solved by a nondeterministic Turing machine

The problem cannot be solved in polynomial time

A proposed YES solution (certificate) can be verified in polynomial time

The problem requires exponential time in the worst case

Answer: C — certificates verifiable in polynomial time.

NP is defined by the existence of a polynomial-time verifier: given an input $I$ and a certificate $c$, we can check in poly($|I|$) time whether $c$ is a valid proof that $I$ is YES. This is the most useful characterisation for algorithm designers.

Option A is technically correct (NP = nondeterministic polynomial time) but obscures the practical meaning. Options B and D are wrong: P $\subseteq$ NP, so every problem in P is also in NP. NP does not mean “hard” — it means “easy to verify.” Whether NP problems are hard to solve is precisely the P vs NP question.

prog / dsa / ch13 / q02 ★★ Reductions

If problem $A$ reduces to problem $B$ in polynomial time ($A \leq_P B$) and $B \in P$, what can we conclude about $A$?

$A$ is NP-hard

$A \in P$ — $A$ can be solved in polynomial time by converting to $B$ and using $B$’s algorithm

$A$ is harder than $B$

Nothing — the direction of the reduction matters and this tells us about $B$, not $A$

Answer: B — $A \in P$.

If $A \leq_P B$, then to solve any instance of $A$: (1) convert it to an instance of $B$ in poly time; (2) solve $B$ in poly time (since $B \in P$). Total: poly + poly = poly time. So $A \in P$.

Direction matters critically: $A \leq_P B$ means “$B$ can solve $A$,” so $B$ is at least as hard as $A$. If $B$ is easy ($B \in P$), then $A$ is easy too. If $A$ is hard (NP-hard), then $B$ is hard too. The contrapositive of our conclusion: if $A \notin P$, then $B \notin P$.

prog / dsa / ch13 / q03 ★★ NP-Complete

Subset Sum is in NP and is NP-complete. We showed a pseudopolynomial $O(nT)$ algorithm for it in Ch12. Does this contradict NP-completeness (i.e., does it prove P = NP)?

Yes — the $O(nT)$ algorithm is polynomial and proves Subset Sum is in P

Yes — NP-complete problems cannot have any efficient algorithms

No — $O(nT)$ is pseudopolynomial, not polynomial; $T$ can be exponentially large in $n$

No — NP-complete means no algorithm exists, only approximations

Answer: C — pseudopolynomial, not polynomial.

The $O(nT)$ algorithm is only polynomial if $T$ is polynomially bounded in $n$ (i.e., $T = O(n^c)$). In general, $T$ can be an integer up to $2^n$ — in which case $O(nT) = O(n \cdot 2^n)$, which is exponential.

“Polynomial time” in complexity theory means polynomial in the bit length of the input. The input size is $O(n + \log T)$ bits. An algorithm is polynomial if it runs in $\text{poly}(n + \log T)$. The $O(nT)$ DP runs in time polynomial in $T$ but $T = 2^{\log T}$, so it’s exponential in $\log T$. This is why Subset Sum is “weakly NP-complete” — it has a pseudopolynomial algorithm. “Strongly NP-complete” problems (like 3-Partition) have no pseudopolynomial algorithm unless P = NP.

prog / dsa / ch13 / q04 ★★★ Practical Complexity

You need to solve Travelling Salesman (NP-hard) for $n = 20$ cities in a trading route optimisation. What is the most practical approach?

Give up — NP-hard means no algorithm can solve it

Wait for a polynomial-time TSP algorithm to be published

Use a DP exact algorithm ($O(n^2 2^n)$ for $n=20$ is $\approx 400 \times 10^6$ steps — feasible), or an approximation algorithm / heuristic

Reduce to a polynomial-time problem by ignoring some cities

Answer: C.

For small $n$, exact exponential algorithms are feasible. The Held-Karp DP solves TSP in $O(n^2 2^n)$ time. For $n = 20$: $400 \times 2^{20} \approx 400 \times 10^6$ operations — about 0.4 seconds on a modern machine. For $n = 25$: $\approx 800 \times 10^6$. For $n = 30$: $\approx 900 \times 10^9$ — impractical.

For larger $n$: (1) nearest-neighbour heuristic, (2) 2-opt/3-opt local search, (3) Christofides algorithm (1.5-approximation for metric TSP), (4) commercial solvers like Concorde for exact solutions up to $n \sim 10{,}000$. NP-hard means “no polynomial-time algorithm known” — not “impossible in practice for real problem sizes.”

Terminal Questions — Chapter 13 10 problems · No answers given

Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 careful argument.

For each problem, identify a certificate and verifier that place it in NP: (a) 3-Colouring: can a graph be coloured with 3 colours so no edge has endpoints of the same colour? (b) Hamiltonian Cycle: does a graph contain a cycle visiting every vertex exactly once? (c) Knapsack decision: is there a subset of items with total weight $\leq W$ and total value $\geq V$? Easy

Which of these are in P? (a) Shortest path in a graph with non-negative edge weights. (b) Longest simple path in a general graph. (c) 2-Colouring a graph (determine if it’s bipartite). (d) 3-Colouring a graph. For (a) and (c), name the polynomial-time algorithm. Easy

Why is Longest Simple Path NP-hard while Shortest Path is in P? Describe the structural difference between the two problems that makes one tractable and the other hard. (Hint: what property of shortest paths enabled Dijkstra’s greedy approach?) Easy

Prove that the following reduction is correct: Longest Simple Path $\leq_P$ Shortest Path by negating edge weights. Does this reduction work for all graphs? What breaks if there are negative-weight cycles in the original graph? Medium

Construct a reduction from 3-Partition to 0/1 Knapsack. (3-Partition: given $n$ integers summing to $3T$, can they be divided into triples each summing to $T$?) Show that a Knapsack oracle solves 3-Partition. Conclude that 0/1 Knapsack is NP-hard. Medium

Suppose you are given a black-box oracle that solves Subset Sum in $O(1)$ time. Describe how to use it to: (a) find the actual subset (not just yes/no) in $O(n)$ oracle calls; (b) solve 0/1 Knapsack optimisation (find max-value subset under weight constraint) in $O(n \log V)$ oracle calls where $V$ is the max possible value. Medium

2-SAT is in P while 3-SAT is NP-complete. Describe the polynomial-time algorithm for 2-SAT (hint: model each clause $(\ell_1 \lor \ell_2)$ as implications $\neg\ell_1 \Rightarrow \ell_2$ and $\neg\ell_2 \Rightarrow \ell_1$, then use SCC). Why doesn’t this extend to 3-SAT? Medium

Prove that if $A \leq_P B$ and $B \leq_P C$, then $A \leq_P C$ (reductions are transitive). Then prove: if $B$ is NP-complete and $B \leq_P C$ and $C \in$ NP, then $C$ is NP-complete. Hard

A portfolio manager claims to have a polynomial-time algorithm that finds the exact maximum-return portfolio for arbitrary $n$ assets under an exact budget constraint $B$ (where costs and $B$ are arbitrary integers). Assuming P $\neq$ NP, why is this claim almost certainly false? Construct a formal argument by reduction from Subset Sum. Hard

Prove that the Halting Problem is undecidable using a diagonalisation argument. Suppose for contradiction there exists a program $H(P, I)$ that correctly decides whether program $P$ halts on input $I$. Construct a program $D$ using $H$ such that $D$ leads to a contradiction. (This is the key result that bounds the limits of computation.) Hard

← Dynamic Programming II ✓ Module Complete — Back to All Chapters