/
ProgrammingAlgorithms & DSDynamic Programming II
Progress
12 / 13
← All Chapters
Dynamic Programming Chapter 12 of 13 · Algorithms and Data Structures

Dynamic Programming II

More DP problems, harder subproblem designs. Floyd-Warshall unifies all-pairs shortest paths in three nested loops. Rod Cutting and Subset Sum introduce integer-valued subproblems — and the subtle distinction between polynomial and pseudopolynomial time.

DP on Integers — A New Subproblem Pattern

In Ch11, subproblems were indexed by positions in a sequence (suffix $A[i:]$, substring $A[i:j]$). A new pattern: subproblems indexed by integers — the value of a target sum, a rod length, or an edge count. This unlocks powerful algorithms but also raises a subtle complexity question: is $O(nT)$ time actually efficient when $T$ can be huge?

Problem 1 — Floyd-Warshall (All-Pairs Shortest Paths)

Given a weighted directed graph, find $\delta(u,v)$ for all pairs $u, v \in V$. Running Bellman-Ford from each source costs $O(|V|^2 \cdot |E|) = O(|V|^4)$ for dense graphs. Floyd-Warshall achieves $\Theta(|V|^3)$ with a beautifully simple DP.

The key idea: instead of bounding path length by number of edges, restrict which intermediate vertices the path may use. Number vertices $1, \ldots, |V|$.

SRT BOT for Floyd-Warshall:

1. Subproblems: $d(u, v, k)$ = minimum weight of any path from $u$ to $v$ using only intermediate vertices from $\{1, \ldots, k\}$ (endpoints $u, v$ may be any vertex).
2. Relate: does the optimal path use vertex $k$ as an intermediate? $$d(u,v,k) = \min\bigl(d(u,v,k-1),\;\; d(u,k,k-1) + d(k,v,k-1)\bigr)$$ 3. Topo order: increasing $k$.
4. Base cases: $d(u,u,0) = 0$; $d(u,v,0) = w(u,v)$ if $(u,v) \in E$; else $+\infty$.
5. Original: $d(u,v,|V|)$ for all $u,v$.
6. Time: $O(|V|^3)$ subproblems $\times$ $O(1)$ work $= O(|V|^3)$.
Floyd-Warshall APSP · O(|V|³)
def floyd_warshall(W):
    """
    All-pairs shortest paths on |V|-vertex graph.
    W[u][v] = weight of edge (u,v), or float('inf') if no edge.
    W[u][u] = 0 for all u.
    Returns dist[u][v] = delta(u,v), or -inf if negative cycle.
    O(|V|^3) time and space.
    """
    import copy
    V = len(W)
    d = copy.deepcopy(W)                    # d[u][v] = d(u,v,0)

    for k in range(V):                      # allow vertex k as intermediate
        for u in range(V):
            for v in range(V):
                # either don't use k, or go u->...->k->...->v
                via_k = d[u][k] + d[k][v]
                if via_k < d[u][v]:
                    d[u][v] = via_k

    # Detect negative cycles: if d[u][u] < 0, u is on a neg cycle
    for u in range(V):
        if d[u][u] < 0:
            raise ValueError(f"Negative-weight cycle detected through vertex {u}")
    return d

# Note: d[u][v] in-place update is correct because:
# d(u,v,k) = min(d(u,v,k-1), d(u,k,k-1) + d(k,v,k-1))
# After row/col k is fully settled, d[u][k] = d(u,k,k) = d(u,k,k-1)
# (vertex k cannot improve path to itself via itself if no neg cycle)
# so the in-place update computes the correct value.
#include <vector>
#include <stdexcept>
using namespace std;
const double INF = 1e18;

// Floyd-Warshall APSP — O(|V|^3)
// W[u][v] = edge weight, INF if no edge, 0 on diagonal
vector<vector<double>> floyd_warshall(vector<vector<double>> d) {
    int V = d.size();
    for (int k = 0; k < V; ++k)
        for (int u = 0; u < V; ++u)
            for (int v = 0; v < V; ++v)
                if (d[u][k] + d[k][v] < d[u][v])
                    d[u][v] = d[u][k] + d[k][v];

    for (int u = 0; u < V; ++u)
        if (d[u][u] < 0)
            throw runtime_error("Negative-weight cycle detected");
    return d;
}
↑ Run C++ on Compiler Explorer
Floyd-Warshall is three nested loops of $|V|$ iterations each. That is the entire implementation. The magic is in the subproblem design: by restricting intermediate vertices rather than path length, every subproblem has exactly two dependencies ($O(1)$ branching), giving $O(1)$ work per subproblem. Bellman-Ford has $O(|E|)$ work per subproblem (guessing last edge), giving $O(|V| \cdot |E|)$ total. Floyd-Warshall’s constant-branching recurrence is what achieves $O(|V|^3)$.

Problem 2 — Rod Cutting

A rod of length $L$ can be cut into pieces. A piece of length $\ell$ is worth $v[\ell]$. Maximize total value. This is a subproblem on integers (the remaining rod length), not on sequence positions.

SRT BOT for Rod Cutting:

1. Subproblems: $x(\ell)$ = maximum value from a rod of length $\ell$, for $0 \leq \ell \leq L$.
2. Relate: brute-force the length $p$ of the first piece cut: $$x(\ell) = \max\bigl\{v[p] + x(\ell - p) \;\big|\; p \in \{1, \ldots, \ell\}\bigr\}$$ 3. Topo order: increasing $\ell$.
4. Base case: $x(0) = 0$.
5. Original: $x(L)$.
6. Time: $O(L)$ subproblems $\times$ $O(L)$ work each $= O(L^2)$. Strongly polynomial since input has $O(L)$ integers.
Rod Cutting · O(L²) with piece reconstruction
def cut_rod(L, v):
    """
    Max value from rod of length L with piece values v[0..L].
    v[0]=0, v[l]=value of piece of length l. O(L^2).
    Returns (max_value, list_of_pieces).
    """
    x      = [0] * (L + 1)          # x[l] = max value for rod of length l
    parent = [None] * (L + 1)       # parent[l] = l - (first piece chosen)

    for l in range(1, L + 1):       # topological order: increasing l
        for piece in range(1, l + 1):
            candidate = v[piece] + x[l - piece]
            if candidate > x[l]:
                x[l]      = candidate
                parent[l] = l - piece    # remainder after cutting piece

    # Reconstruct cuts from parent pointers
    pieces, l = [], L
    while l > 0:
        piece = l - parent[l]        # piece length chosen at length l
        pieces.append(piece)
        l = parent[l]
    return x[L], pieces

# Example: L=7, v=[0,1,10,13,18,20,31,32]
# x[7] = 33 via pieces [2, 2, 3] (10+10+13=33)
#include <vector>
#include <algorithm>
using namespace std;

pair<int, vector<int>> cut_rod(int L, const vector<int>& v) {
    vector<int> x(L+1, 0), parent(L+1, -1);
    for (int l = 1; l <= L; ++l) {
        for (int p = 1; p <= l; ++p) {
            int cand = v[p] + x[l - p];
            if (cand > x[l]) {
                x[l] = cand;
                parent[l] = l - p;   // remainder
            }
        }
    }
    // Reconstruct pieces
    vector<int> pieces;
    for (int l = L; l > 0; l = parent[l])
        pieces.push_back(l - parent[l]);
    return {x[L], pieces};
}
↑ Run C++ on Compiler Explorer

Problem 3 — Subset Sum (and the Pseudopolynomial Question)

Given $n$ positive integers and a target $T$: does any subset sum exactly to $T$? This is a decision problem (yes/no), not an optimisation problem.

SRT BOT for Subset Sum:

1. Subproblems: $x(i, t)$ = can any subset of $A[i:]$ sum to exactly $t$? For $0 \leq i \leq n$, $0 \leq t \leq T$.
2. Relate: is item $A[i]$ included? $$x(i,t) = x(i+1,\; t - A[i]) \text{ (if } t \geq A[i]\text{)} \quad\mathbf{OR}\quad x(i+1,\; t)$$ 3. Topo order: decreasing $i$.
4. Base cases: $x(i, 0) = \text{True}$ (empty subset sums to 0); $x(n, t) = \text{False}$ for $t > 0$.
5. Original: $x(0, T)$.
6. Time: $O(nT)$ subproblems $\times$ $O(1)$ work $= O(nT)$.
Subset Sum · O(nT) pseudopolynomial
def subset_sum(A, T):
    """
    Does any subset of A sum to exactly T?
    O(n*T) time and space (pseudopolynomial).
    """
    n = len(A)
    # x[i][t] = can subset of A[i:] sum to t
    # Space optimisation: only need current and next row -> O(T) space
    # Use 1D DP: x[t] = reachable with items A[i:]
    # Process items in reverse, update t in reverse to avoid using same item twice

    x = [False] * (T + 1)
    x[0] = True                        # base: empty subset sums to 0

    for item in A:                     # add one item at a time
        # Traverse t in reverse to avoid using item more than once
        for t in range(T, item - 1, -1):
            if x[t - item]:
                x[t] = True            # can reach t by adding item

    return x[T]

def subset_sum_with_trace(A, T):
    """Returns the actual subset, or None if impossible."""
    n = len(A)
    # Full 2D table for traceback
    x = [[False] * (T + 1) for _ in range(n + 1)]
    for i in range(n + 1):
        x[i][0] = True                 # base: t=0 always reachable

    for i in range(n - 1, -1, -1):    # decreasing i (topo order)
        for t in range(1, T + 1):
            x[i][t] = x[i + 1][t]     # don't include A[i]
            if t >= A[i]:
                x[i][t] = x[i][t] or x[i + 1][t - A[i]]

    if not x[0][T]: return None

    # Traceback: find which items were included
    subset, t = [], T
    for i in range(n):
        if t >= A[i] and x[i + 1][t - A[i]]:
            subset.append(A[i])
            t -= A[i]
    return subset

# Example: A=[1,3,4,12,19,21,22], T=47 -> [3,4,19,21] sums to 47
#include <vector>
using namespace std;

// Subset sum — O(n*T) pseudopolynomial
bool subset_sum(const vector<int>& A, int T) {
    vector<bool> x(T + 1, false);
    x[0] = true;
    for (int item : A)
        for (int t = T; t >= item; --t)   // reverse to avoid reuse
            if (x[t - item]) x[t] = true;
    return x[T];
}

// With traceback — O(n*T) time and space
vector<int> subset_sum_trace(const vector<int>& A, int T) {
    int n = A.size();
    vector<vector<bool>> x(n+1, vector<bool>(T+1, false));
    for (int i = 0; i <= n; ++i) x[i][0] = true;
    for (int i = n-1; i >= 0; --i)
        for (int t = 1; t <= T; ++t) {
            x[i][t] = x[i+1][t];
            if (t >= A[i]) x[i][t] = x[i][t] || x[i+1][t-A[i]];
        }
    if (!x[0][T]) return {};
    vector<int> result;
    int t = T;
    for (int i = 0; i < n; ++i)
        if (t >= A[i] && x[i+1][t-A[i]]) { result.push_back(A[i]); t -= A[i]; }
    return result;
}
↑ Run C++ on Compiler Explorer

Polynomial vs Pseudopolynomial — A Critical Distinction

Subset Sum runs in $O(nT)$. Is this polynomial? It depends on how big $T$ can be relative to the input size.

Strongly polynomial: runtime bounded by a polynomial in the number of input values (words). Example: $O(n^2)$ for an $n$-element array — regardless of how large those values are.

Pseudopolynomial: runtime bounded by a polynomial in both the number of input values and the input values themselves. Example: $O(nT)$ for Subset Sum. If $T = 2^n$, this is exponential in $n$.

The key test: can the input integers be arbitrarily large relative to $n$? If yes, pseudopolynomial is not polynomial. If integers are polynomially bounded in $n$ (i.e., $T = O(n^c)$ for some constant $c$), then pseudopolynomial = polynomial for that input class.
Algorithm Runtime Classification Why
Merge Sort$O(n\log n)$Strongly polyDepends only on $n$, not on values
Rod Cutting$O(L^2)$Strongly polyInput has $L+1$ integers; $O(L^2)$ is poly in $L$
Subset Sum$O(nT)$Pseudopoly$T$ can be $2^n$; input has $n+1$ integers but algorithm grows with $T$
Counting Sort$O(n+u)$Pseudopoly$u$ (key range) can be exponential in $n$
Floyd-Warshall$O(|V|^3)$Strongly polyDepends only on $|V|$, not on edge weights

A Tour of DP Subproblem Patterns

Looking back across all DP problems in Chs 11–12, every one uses one of a small number of subproblem patterns. Recognising the pattern is half the battle:

Common subproblem families:

Suffixes $A[i:]$ — Bowling, LIS, Piano Fingering, Subset Sum.
Prefixes $A[:i]$ — Rod Cutting (prefixes of possible lengths), Bellman-Ford (prefix of $k$ edges).
Substrings $A[i:j]$ — Alternating Coin Game, Arithmetic Parenthesization.
Two sequences $A[i:] \times B[j:]$ — LCS.
Integers $\{0,\ldots,T\}$ — Fibonacci, Rod Cutting, Subset Sum.
Vertices $\times$ integers — Bellman-Ford ($v \times k$), Floyd-Warshall ($u \times v \times k$).
With expanded state — LIS (must include $A[i]$), Coin Game (whose turn), Piano (start finger).

When stuck: try the simplest pattern first (suffix), then expand state until the recurrence has enough information to be correct.
Worked Example · Options Portfolio Budget Allocation 🇮🇳 TTT Strategy

You have a budget of $B$ rupees. There are $n$ option contracts available, each with cost $c_i$ and expected PnL $p_i$. You want to select a subset of contracts with total cost $\leq B$ to maximise total expected PnL. This is the 0/1 Knapsack problem — a direct generalisation of Subset Sum.

SRT BOT for 0/1 Knapsack

Subproblems: $x(i, b)$ = max PnL using only contracts $i, i+1, \ldots, n-1$ with budget $b$.
Relate: include contract $i$ or don’t: $$x(i,b) = \max\bigl(x(i+1, b),\;\; p_i + x(i+1, b - c_i)\bigr) \text{ if } b \geq c_i$$ Time: $O(nB)$ — pseudopolynomial.

Python implementation
def knapsack(costs, pnls, B):
    """0/1 Knapsack: max PnL with budget B. O(n*B) pseudopolynomial."""
    n = len(costs)
    # x[b] = max PnL with remaining budget b
    x = [0] * (B + 1)
    for i in range(n):
        for b in range(B, costs[i] - 1, -1):  # reverse to avoid reuse
            x[b] = max(x[b], pnls[i] + x[b - costs[i]])
    return x[B]

# Example: 4 contracts, budget = 10
costs = [3, 4, 5, 2];  pnls = [4, 5, 6, 3];  B = 10
print(knapsack(costs, pnls, B))  # → 13 (contracts 0,1,3: cost=9, pnl=12... check)
Budget constraint: $B = 10$, select contracts 0 (cost 3, pnl 4) + 1 (cost 4, pnl 5) + 3 (cost 2, pnl 3) = cost 9, pnl 12. Or 0 + 2 + 3 = cost 10, pnl 13. The DP correctly finds the optimum. In practice, for options portfolios, $n$ is small (tens of contracts) and $B$ is bounded (budget in thousands, costs scaled to integers), making this genuinely practical.
The 0/1 Knapsack DP is the exact model for discretionary portfolio allocation under a budget constraint: each instrument has a cost (margin required) and an expected return. The $O(nB)$ DP is practical when costs are small integers. For TTT’s short-vol strategies, margin requirements per position are known from the broker API, and the number of available strategies is small — Knapsack in $O(n \cdot \text{margin\_budget})$ runs in milliseconds. For larger universes, the pseudopolynomial scaling becomes a bottleneck and approximation schemes (FPTAS) are used instead.

Practice Problems
4 questions · Chapter 12
prog / dsa / ch12 / q01 ★ Floyd-Warshall

Floyd-Warshall computes all-pairs shortest paths in $O(|V|^3)$. What does the subproblem $d(u,v,k)$ mean?

A
Shortest path from $u$ to $v$ using at most $k$ edges
B
Shortest path from $u$ to $v$ using only vertices $\{1, \ldots, k\}$ as intermediates
C
Shortest path from $u$ to $v$ using exactly $k$ intermediate vertices
D
Shortest path from vertex $k$ to $v$ starting from $u$
Answer: B.

$d(u,v,k)$ = minimum weight path from $u$ to $v$ where every intermediate vertex (all vertices except the endpoints $u$ and $v$ themselves) has label $\leq k$. This is the crucial design choice that gives Floyd-Warshall its efficiency: by expanding the allowed intermediate vertex set one vertex at a time, each subproblem has exactly two children ($O(1)$ branching), giving $O(1)$ work per subproblem and $O(|V|^3)$ total.

Option A describes Bellman-Ford’s subproblem ($\delta_k(s,v)$), which leads to $O(|V| \cdot |E|)$ per source vertex = $O(|V|^2 |E|)$ total.
prog / dsa / ch12 / q02 ★★ Subset Sum

$A = [3, 4, 3, 1]$, $T = 6$. Does a subset of $A$ sum to 6? What is it?

A
No — no subset of $\{3,4,3,1\}$ sums to 6
B
Yes — $\{3, 3\}$ (the two 3s sum to 6)
C
Yes — $\{1, 4, 1\}$ (but 1 only appears once)
D
Yes — $\{3, 4\}$ (but that sums to 7, not 6)
Answer: B — $\{3, 3\}$.

$A = [3, 4, 3, 1]$. The two elements at indices 0 and 2 both have value 3. $3 + 3 = 6 = T$. Also $1 + 4 + 1$? No, 1 only appears once. $4 + 3 - 1$? Not a sum. $3 + 1 + 1 + 1$? Only one 1. So $\{3, 3\}$ is the unique solution.

Running the DP: $x(3, \cdot) = [T,F,\ldots,F,\ldots]$ (only $t=0$ and $t=1$ true). $x(2, \cdot)$: can now use $A[2]=3$; $t=3$ and $t=4$ become true. $x(1, \cdot)$: can use $A[1]=4$; $t=7$ true but $>T$. $x(0, \cdot)$: use $A[0]=3$; $t=6$ becomes $x(1, 3)$ which is True. So $x(0,6) = $ True.
prog / dsa / ch12 / q03 ★★ Pseudopolynomial

The Subset Sum DP runs in $O(nT)$. If $n = 100$ and $T = 2^{100}$, approximately how many operations does this require?

A
$O(100^2) = 10{,}000$ — it’s polynomial in $n$
B
$O(n \log T) = O(100 \times 100) = 10{,}000$ — exponentials shrink in log
C
$O(n \cdot 2^{100}) \approx 10^{32}$ — exponential in $n$, completely infeasible
D
$O(2^n) = 2^{100}$ — same as brute force
Answer: C — $O(n \cdot 2^{100}) \approx 10^{32}$ operations.

The DP is $O(nT)$. If $T = 2^{100}$, then $O(nT) = O(100 \times 2^{100}) \approx 10^{32}$. At $10^{18}$ operations/second (modern GPU cluster), this would take $10^{14}$ seconds — longer than the age of the universe.

This is the core of why Subset Sum is believed to be NP-hard when $T$ is exponentially large: no polynomial-time algorithm is known. The $O(nT)$ DP is only useful when $T = O(\text{poly}(n))$. For $T = 2^{100}$, even the exponential brute-force ($2^{100}$ subsets) doesn’t help much — both are astronomical.
prog / dsa / ch12 / q04 ★★★ Subproblem Design

In the Subset Sum DP, why do we iterate $t$ in reverse (from $T$ down to $A[i]$) when using the space-optimised 1D array?

A
For cache efficiency — accessing memory in reverse order is faster
B
Because the topological order requires decreasing $t$
C
To prevent item $A[i]$ from being used more than once in the same update pass
D
Because $x[T]$ must be computed before $x[0]$ for the base case
Answer: C.

The 0/1 Knapsack / Subset Sum problem: each item can be used at most once. When we process item $A[i]$, we update $x[t] = x[t]$ OR $x[t - A[i]]$. If we iterate $t$ in increasing order, then when we set $x[t] = $ True using $x[t - A[i]]$, that $x[t - A[i]]$ might already have been updated in the same pass by item $A[i]$ (if $t - A[i] > A[i]$). This would allow the same item to be counted multiple times.

Iterating in decreasing order ensures that when we read $x[t - A[i]]$, it still reflects the state before item $A[i]$ was considered — i.e., it represents the result without this item. This is the 0/1 constraint enforced by iteration order, not by extra data structures.

Terminal Questions — Chapter 12 10 problems · No answers given

Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 careful argument.

1
Run Floyd-Warshall on the 4-vertex graph with edges: $1\to2$ (3), $1\to3$ (8), $1\to4$ (-4), $2\to4$ (1), $2\to5$ (7), $3\to2$ (4), $4\to3$ (2), $5\to1$ (6), $5\to4$ (7). Compute the full $5\times5$ distance matrix after each value of $k$. Easy
2
Run Rod Cutting on $L=7$, $v=[0,1,10,13,18,20,31,32]$. Compute $x[0]$ through $x[7]$ and identify the optimal cuts. Verify that your cuts sum to 33. Easy
3
Run Subset Sum on $A=[1,3,4,12,19,21,22]$, $T=47$. Use the 2D DP and trace back the actual subset. Verify it sums to 47. Easy
4
Implement 0/1 Knapsack with traceback in Python. Given contracts with costs $[3,4,5,2,6]$ and expected PnLs $[4,5,6,3,7]$ and budget $B=10$, find the optimal subset of contracts and the total PnL. Medium
5
Floyd-Warshall detects negative cycles by checking if $d[u][u] < 0$ at the end. Prove this is correct: show that $d[u][u] < 0$ after the algorithm if and only if $u$ lies on a negative-weight cycle. Medium
6
Modify the Rod Cutting DP to handle the case where cuts have a fixed cost $c$ (e.g., each cut costs $c$ rupees in blade wear). Write the updated recurrence and code. Does the time complexity change? Medium
7
The Coin Change problem: given coin denominations $d_1, \ldots, d_k$ (each usable unlimited times) and a target $T$, find the minimum number of coins summing to $T$. Write the SRT BOT specification and Python code. Is this pseudopolynomial? Compare to Subset Sum. Medium
8
Prove that the Floyd-Warshall in-place update is correct: show that after processing vertex $k$, $d[u][k] = d(u,k,k) = d(u,k,k-1)$ (vertex $k$ cannot appear as an intermediate on the shortest path from $u$ to $k$ if there is no negative cycle). Hard
9
The Arithmetic Parenthesization problem from the lecture: given expression $a_0 * a_1 * \ldots * a_{n-1}$ where each $*$ is $+$ or $\times$, place parentheses to maximise the value. Design the full SRT BOT DP, explain why we need both min and max subproblems, and write the Python code. Hard
10
Bellman-Ford can be viewed as a DP on subproblems $\delta_k(s,v)$. Floyd-Warshall is a DP on subproblems $d(u,v,k)$. Both involve a third parameter $k$. Explain the key difference in what $k$ means in each, and why Floyd-Warshall achieves $O(|V|^3)$ while Bellman-Ford per source achieves only $O(|V| \cdot |E|)$. Hard