Cryptography & Cache-Oblivious Algorithms
Two seemingly unrelated topics united by one theme: algorithms designed for a specific model of the world. Cryptography designs algorithms that are hard to reverse without a secret key, building security from mathematical hardness. Cache-oblivious algorithms design for memory hierarchy without knowing the cache size, achieving optimal memory transfers automatically on any machine.
Part I — Cryptography
Hash Functions
A cryptographic hash function $h: \{0,1\}^* \to \{0,1\}^d$ maps arbitrary-length inputs to fixed-length outputs. It is deterministic, public, and should behave like a random function (random oracle model). Common functions and output sizes: MD5 (128 bits, broken), SHA-1 (160 bits, weakened), SHA-256 (256 bits), SHA-512 (512 bits).
1. One-way (preimage resistance): given $y$, hard to find $x$ with $h(x) = y$. Cost: $O(2^d)$.
2. Strong collision resistance: hard to find any $x \neq x'$ with $h(x) = h(x')$. Cost via birthday paradox: $O(2^{d/2})$.
3. Weak collision resistance (2nd preimage): given $x$, hard to find $x' \neq x$ with $h(x) = h(x')$.
4. Pseudorandomness: output indistinguishable from uniform random.
5. Non-malleability: given $h(x)$, hard to compute $h(f(x))$ for any $f$.
Implications: $2 \Rightarrow 3$ (strong $\Rightarrow$ weak collision). $1 \not\Rightarrow 2$, $2 \not\Rightarrow 1$ (independent properties).
Symmetric-Key Encryption
Alice and Bob share a secret key $k$. Encryption: $c = e_k(m)$. Decryption: $m = d_k(c)$. Examples: AES (Advanced Encryption Standard), DES, RC5. The core challenge: how do Alice and Bob agree on $k$ without ever meeting?
Diffie-Hellman Key Exchange
Alice picks secret $a$, sends $g^a \bmod p$. Bob picks secret $b$, sends $g^b \bmod p$.
Alice computes $(g^b)^a = g^{ab} \bmod p$. Bob computes $(g^a)^b = g^{ab} \bmod p$.
Shared key $k = g^{ab} \bmod p$. An eavesdropper sees $g^a$ and $g^b$ but cannot compute $g^{ab}$ (assuming the Discrete Logarithm Problem is hard).
Vulnerability: man-in-the-middle attack. Eve intercepts both $g^a$ and $g^b$, establishing separate keys with Alice and Bob. Authentication is needed (public-key certificates).
RSA Public-Key Encryption
Encryption: $c = m^e \bmod N$. Decryption: $m = c^d \bmod N$.
Correctness: $c^d = (m^e)^d = m^{ed} = m^{1+k\phi} \equiv m \pmod N$ (by Fermat’s little theorem for each prime factor). This works because $ed \equiv 1 \pmod \phi$ means $ed = 1 + k(p-1)(q-1)$.
Security: relies on the hardness of factoring $N = pq$ (given $N$, find $p, q$) and the RSA problem (given $e$ and $c = m^e$, find $m$). Primality testing is in P (AKS 2004), but factoring remains believed hard.
Digital Signatures and MACs
| Confidentiality | Integrity | |
|---|---|---|
| Symmetric | Private-key encryption (AES) | MAC (SHA3-keyed) |
| Asymmetric | Public-key encryption (RSA) | Digital signatures (RSA-sign) |
Digital signature: $\sigma = \text{Sign}(\text{sk}, m)$; anyone can verify with $\text{Verify}(\text{pk}, m, \sigma)$. RSA signature: $\sigma = m^d \bmod N$, verify $\sigma^e \equiv m \pmod N$. Naively broken by multiplicative homomorphism: $\sigma_1 \sigma_2 = (m_1 m_2)^d$ is a valid signature of $m_1 m_2$. Fix: hash-then-sign ($\sigma = h(m)^d \bmod N$).
MAC (Message Authentication Code): symmetric integrity. $\sigma = \text{MAC}(k, m)$; verify by recomputing. Standard: HMAC = $h(k \| m)$ with SHA-3. Unlike a plain hash, MAC requires knowledge of $k$ to forge.
Merkle Trees
Merkle tree: binary tree where each leaf stores $h(x_i)$, each internal node stores $h(\text{left child} \| \text{right child})$. Alice stores only the root hash $\sigma_\text{root}$ locally (1 hash = $O(1)$ trusted storage).
Verification of file $x_i$: Alice downloads $x_i$ and the $O(\log n)$ sibling hashes on the path to the root. She recomputes the path and checks against $\sigma_\text{root}$. If $x_i$ was modified, the leaf hash changes and propagates up — unless a hash collision exists.
Update: modify $x_i$, recompute the $O(\log n)$ hashes on the path to root. Update $\sigma_\text{root}$.
Used in: Bitcoin/blockchain (transaction integrity), git (commit object hashes), certificate transparency logs, and any distributed system needing integrity of large datasets.
NP-Completeness and Cryptography
NP-hardness is about worst-case complexity. Cryptography needs average-case hardness — the problem must be hard for typical instances, not just contrived worst cases. This is why most NP-complete problems are poor bases for cryptography: 3-colorability is NP-complete but easy on random graphs (average-case trivial). The Merkle-Hellman knapsack system was broken using lattice basis reduction (low-density knapsacks). Modern cryptography is based on number-theoretic assumptions (discrete log, factoring) that appear hard on average for properly chosen parameters.
import hashlib, math, random
# ── RSA KEY GENERATION (educational, use cryptography library in prod) ──
def extended_gcd(a, b):
"""Extended Euclidean: returns (gcd, x, y) with a*x + b*y = gcd."""
if a == 0: return b, 0, 1
g, x, y = extended_gcd(b % a, a)
return g, y - (b//a)*x, x
def mod_inverse(e, phi):
"""Compute d = e^(-1) mod phi via extended Euclidean."""
g, d, _ = extended_gcd(e % phi, phi)
if g != 1: raise ValueError("No inverse — gcd != 1")
return d % phi
def rsa_keygen(p, q, e=65537):
"""
RSA key generation from two primes p, q.
Public key: (N, e). Private key: (d, p, q).
Security relies on difficulty of factoring N = p*q.
"""
N = p * q
phi = (p - 1) * (q - 1)
assert math.gcd(e, phi) == 1, "e must be coprime to phi"
d = mod_inverse(e, phi)
return (N, e), (d, p, q) # (public_key, private_key)
def rsa_encrypt(m, public_key):
N, e = public_key
return pow(m, e, N) # c = m^e mod N
def rsa_decrypt(c, private_key):
d, p, q = private_key
N = p * q
return pow(c, d, N) # m = c^d mod N
# Example (use large primes in practice!)
pk, sk = rsa_keygen(61, 53)
m = 42
c = rsa_encrypt(m, pk)
m2 = rsa_decrypt(c, sk)
print(f"Original: {m}, Ciphertext: {c}, Decrypted: {m2}") # 42
# ── MERKLE TREE ──────────────────────────────────────────────────────
class MerkleTree:
"""
Merkle tree for integrity verification of n data blocks.
Trusted storage: O(1) — just the root hash.
Verification of block i: O(log n) — path to root.
Update of block i: O(log n) — recompute path.
"""
def __init__(self, blocks):
"""blocks: list of bytes objects."""
self.n = len(blocks)
self.leaves = [hashlib.sha256(b).hexdigest() for b in blocks]
self.tree = self._build(self.leaves)
def _hash_pair(self, left, right):
return hashlib.sha256((left+right).encode()).hexdigest()
def _build(self, layer):
if len(layer) == 1: return layer
if len(layer) % 2 == 1: layer = layer + [layer[-1]] # duplicate last
next_layer = [self._hash_pair(layer[i], layer[i+1])
for i in range(0, len(layer), 2)]
return self._build(next_layer) + layer
@property
def root(self):
return self.tree[0] # root hash (store locally — O(1))
def proof(self, i):
"""Generate Merkle proof: O(log n) sibling hashes."""
idx = len(self.tree) - self.n + i
path = []
while idx > 0:
sibling = idx - 1 if idx % 2 == 1 else idx + 1
if sibling < len(self.tree):
path.append((sibling, self.tree[sibling]))
idx = (idx - 1) // 2
return path
def verify(self, block, i, proof, root):
"""Verify block i against stored root hash. O(log n)."""
current = hashlib.sha256(block).hexdigest()
idx = len(self.tree) - self.n + i
for sib_idx, sib_hash in proof:
if sib_idx == idx - 1: # sibling is left
current = self._hash_pair(sib_hash, current)
else:
current = self._hash_pair(current, sib_hash)
idx = (idx - 1) // 2
return current == root
# Example: 4 files
files = [b"file0", b"file1", b"file2", b"file3"]
tree = MerkleTree(files)
print(f"Root hash: {tree.root[:16]}...")
proof = tree.proof(1)
valid = tree.verify(b"file1", 1, proof, tree.root)
print(f"Proof valid: {valid}") # True
tampered = tree.verify(b"TAMPERED", 1, proof, tree.root)
print(f"Tampered detected: {not tampered}") # True
# ── DIFFIE-HELLMAN KEY EXCHANGE ───────────────────────────────────────
def diffie_hellman(p, g):
"""
Diffie-Hellman key exchange over Z_p^*.
p: large prime, g: generator.
Returns (alice_secret, bob_secret, shared_key).
Security: discrete logarithm problem in Z_p^*.
"""
a = random.randint(2, p-2) # Alice's secret
b = random.randint(2, p-2) # Bob's secret
A = pow(g, a, p) # Alice sends g^a mod p
B = pow(g, b, p) # Bob sends g^b mod p
alice_key = pow(B, a, p) # Alice computes (g^b)^a = g^ab mod p
bob_key = pow(A, b, p) # Bob computes (g^a)^b = g^ab mod p
assert alice_key == bob_key
return a, b, alice_key
p = 23 # toy prime (use 2048-bit primes in practice)
g = 5 # generator
_, _, k = diffie_hellman(p, g)
print(f"DH shared key: {k}")// RSA + Diffie-Hellman + Merkle tree (conceptual sketch)
// Production: use OpenSSL, libsodium, or Botan
#include <string>
#include <vector>
#include <cstdint>
using namespace std;
// Modular exponentiation: base^exp mod m — O(log exp)
uint64_t mod_pow(uint64_t base, uint64_t exp, uint64_t mod) {
uint64_t result = 1;
base %= mod;
while (exp > 0) {
if (exp & 1) result = (__uint128_t)result * base % mod;
base = (__uint128_t)base * base % mod;
exp >>= 1;
}
return result;
}
// Extended GCD: returns gcd(a,b), sets x,y s.t. ax+by=gcd
int64_t ext_gcd(int64_t a, int64_t b, int64_t& x, int64_t& y) {
if (a==0){x=0;y=1;return b;}
int64_t x1,y1;
int64_t g = ext_gcd(b%a, a, x1, y1);
x = y1 - (b/a)*x1; y = x1;
return g;
}
// Mod inverse: e^(-1) mod phi
int64_t mod_inv(int64_t e, int64_t phi) {
int64_t x, y;
ext_gcd(e, phi, x, y);
return (x % phi + phi) % phi;
}
struct RSAKey {
uint64_t N, e; // public
uint64_t d; // private
};
RSAKey rsa_keygen(uint64_t p, uint64_t q, uint64_t e=65537) {
uint64_t N = p * q;
uint64_t phi = (p-1)*(q-1);
return {N, e, (uint64_t)mod_inv(e, phi)};
}
uint64_t rsa_enc(uint64_t m, uint64_t N, uint64_t e) { return mod_pow(m,e,N); }
uint64_t rsa_dec(uint64_t c, uint64_t N, uint64_t d) { return mod_pow(c,d,N); }
// Merkle tree root (simplified — real impl uses SHA-256)
string merkle_root(vector<string> hashes) {
while (hashes.size() > 1) {
if (hashes.size()%2) hashes.push_back(hashes.back());
vector<string> next;
for (size_t i=0;i<hashes.size();i+=2)
next.push_back("H(" + hashes[i] + "|" + hashes[i+1] + ")");
hashes = next;
}
return hashes[0];
}Part II — Cache-Oblivious Algorithms
Memory Hierarchy and the I/O Model
Modern computers have a hierarchy: CPU registers (bytes, ~1ns) → L1 cache (~10KB, ~1ns) → L2 (~100KB, ~10ns) → L3 (~MB, ~100ns) → RAM (~GB, ~100ns) → disk (~TB, ~10ms). Each level is larger and slower. Memory is transferred in blocks of $B$ words. The key cost metric: number of memory transfers (block reads/writes), not raw operations.
Cache-oblivious (CO) model: same as EM but the algorithm does NOT know $B$ or $M$. Memory access to any word automatically fetches the containing block (LRU eviction if cache is full). Algorithm must achieve good $MT(N)$ without knowing the cache parameters. An algorithm that works optimally in CO model works optimally on any machine automatically.
LRU Is Near-Optimal
Why LRU? Sleator-Tarjan 1985: $\text{LRU}_M \leq 2 \cdot \text{OPT}_{M/2}$. Partition the block-access sequence into maximal phases of $M/B$ distinct blocks. LRU uses $\leq M/B$ transfers per phase. OPT with $M/2$ cache must use $\geq M/(2B)$ per phase (needs at least half the blocks fresh each phase). So LRU with $M$ is within factor 2 of OPT with $M/2$.
Scanning: $O(N/B + 1)$
Sequential scan of $N$ elements: $\Theta(N/B + 1)$ transfers. The $+1$ covers the case $N < B$. Cannot do better (must read each element at least once). Cache-oblivious: can’t align with block boundary, so gets $\lceil N/B \rceil + 1 = O(N/B + 1)$ — same asymptotically.
Cache-Oblivious Median Finding
Apply the Median of Medians algorithm (Ch1) and analyse its memory transfers. Let $MT(N)$ be the transfers needed.
Recurrence: $MT(N) = MT(N/5) + MT(7N/10) + O(N/B + 1)$.
Base case: $MT(O(B)) = O(1)$ (fits in $O(1)$ blocks).
Solution: $MT(N) = O(N/B + 1)$ — optimal (must read all elements).
Cache-Oblivious Matrix Multiplication
Naive matrix multiplication ($N^3$ operations): inner loop accesses one row of $X$ and one column of $Y$ per output element. If $Y$ is in row-major order, accessing column $j$ skips $N$ words per step: $O(N^2)$ cache misses for one column, $O(N^3)$ total. Very poor locality.
Recursion tree: cost per level is geometrically increasing $N^2/B, 8(N/2)^2/B, \ldots$ — leaves dominate. Number of leaves when $N = O(\sqrt{M})$: $(N/\sqrt{M})^3 = O(N^3/M^{3/2})$. Cost per leaf: $O(M/B)$.
Total: $MT(N) = O\!\left(\frac{N^3}{B\sqrt{M}}\right)$.
This matches the information-theoretic lower bound for cache-oblivious matrix multiply — optimal.
Cache-Oblivious Search: van Emde Boas Layout
Binary search on $N$ sorted elements: naive layout gives $O(\log N)$ transfers (each comparison likely a cache miss). B-tree gives $O(\log_B N)$ but requires knowing $B$.
Search cost: a root-to-leaf search path visits $O(\log N)$ subtrees of size $\leq B$. Each such subtree occupies $\leq 2$ blocks. Total: $O(\log_B N)$ memory transfers — matching B-tree performance without knowing $B$!
This extends to dynamic B-trees with $O(\log_B N)$ insert/delete (cache-obliviously).
Cache-Oblivious Sorting
Optimal sorting in the EM model: $\Theta\!\left(\frac{N}{B}\log_{M/B}\frac{N}{B}\right)$ transfers. Binary mergesort is cache-oblivious but not optimal:
$MT(N) = 2 \cdot MT(N/2) + O(N/B + 1)$, $MT(M) = O(M/B)$.
Solution: $MT(N) = O\!\left(\frac{N}{B}\log_{M/B}\frac{N}{M}\right)$ — better than naive but has extra $\log$ factor vs optimal.
$M/B$-way mergesort (optimal in EM): split into $M/B$ subarrays, recursively sort, merge $M/B$-way (keeping one block per run in cache).
$MT(N) = O\!\left(\frac{N}{B}\log_{M/B}\frac{N}{B}\right)$ — optimal.
Cache-oblivious sorting (Funnelsort): with the tall cache assumption $M = \Omega(B^{1+\varepsilon})$, an $N^\varepsilon$-way mergesort using recursive “funnel” merges achieves the optimal $O\!\left(\frac{N}{B}\log_{M/B}\frac{N}{B}\right)$ transfers without knowing $B$ or $M$.
import numpy as np
# ── CACHE-OBLIVIOUS MATRIX MULTIPLICATION ────────────────────────────
# Track memory transfers (simulated with block access counting)
class CacheSimulator:
"""Simplified cache simulator to count block transfers."""
def __init__(self, B=8, M=64):
self.B, self.M = B, M
self.cache = {} # block_id -> last_access_time
self.time = 0
self.transfers = 0
def access(self, addr):
"""Access a word at address; fetch block if not in cache."""
block_id = addr // self.B
if block_id not in self.cache:
self.transfers += 1
if len(self.cache) >= self.M // self.B:
# LRU eviction
lru = min(self.cache, key=self.cache.get)
del self.cache[lru]
self.cache[block_id] = self.time
else:
self.cache[block_id] = self.time
self.time += 1
def co_matmul(X, Y, B=8, M=64):
"""
Cache-oblivious matrix multiplication via recursive subdivision.
No explicit blocking — the recursion naturally achieves cache-efficiency.
MT(N) = O(N^3 / (B * sqrt(M))) — optimal.
For demonstration: pure Python recursive; NumPy is much faster in practice.
"""
n = len(X)
if n <= 1:
return [[X[0][0] * Y[0][0]]]
# Divide matrices into four n/2 x n/2 quadrants
h = n // 2
A11, A12 = [r[:h] for r in X[:h]], [r[h:] for r in X[:h]]
A21, A22 = [r[:h] for r in X[h:]], [r[h:] for r in X[h:]]
B11, B12 = [r[:h] for r in Y[:h]], [r[h:] for r in Y[:h]]
B21, B22 = [r[:h] for r in Y[h:]], [r[h:] for r in Y[h:]]
def add(P, Q):
return [[P[i][j]+Q[i][j] for j in range(len(P[0]))]
for i in range(len(P))]
# 8 recursive multiplications (like standard divide-and-conquer)
C11 = add(co_matmul(A11,B11), co_matmul(A12,B21))
C12 = add(co_matmul(A11,B12), co_matmul(A12,B22))
C21 = add(co_matmul(A21,B11), co_matmul(A22,B21))
C22 = add(co_matmul(A21,B12), co_matmul(A22,B22))
# Combine quadrants
result = [[0]*n for _ in range(n)]
for i in range(h):
result[i][:h] = C11[i]
result[i][h:] = C12[i]
result[h+i][:h] = C21[i]
result[h+i][h:] = C22[i]
return result
# ── VAN EMDE BOAS TREE LAYOUT ─────────────────────────────────────────
def veb_layout(sorted_arr):
"""
Lay out a sorted array in van Emde Boas (vEB) tree order.
This gives O(log_B N) memory transfers for binary search,
without knowing B — cache-oblivious.
Recursively: take middle element as root, recurse on left half
(bottom subtrees) and right half (top subtrees).
Each subtree of size B fits in O(1) cache blocks.
"""
n = len(sorted_arr)
if n <= 1:
return sorted_arr
if n == 2:
return [sorted_arr[0], sorted_arr[1]]
# Split at the midpoint of tree height (not array midpoint)
import math
h = int(math.log2(n + 1)) # height of BST
h_top = h // 2
h_bot = h - h_top
top_size = (1 << h_top) - 1 # nodes in top subtree
bot_size = (1 << h_bot) - 1 # nodes in each bottom subtree
# Root of top subtree is at position top_size in a BFS ordering
# vEB layout recursively arranges bottom subtrees, then top subtree
top_subtree = sorted_arr[:top_size + 1]
bot_subtrees = [sorted_arr[top_size + 1 + i*(bot_size+1):
top_size + 1 + (i+1)*(bot_size+1)]
for i in range(top_size + 1)]
result = []
for sub in bot_subtrees:
result.extend(veb_layout(sub))
result.extend(veb_layout(top_subtree))
return result
# ── BINARY MERGESORT TRANSFER ANALYSIS ───────────────────────────────
def mergesort_transfers(N, B, M):
"""
Estimate memory transfers for binary mergesort.
MT(N) = O(N/B * log_{M/B}(N/M)) — not optimal.
Optimal: N/B * log_{M/B}(N/B).
"""
if N <= M:
return N / B # fits in cache: O(N/B) to load
# log_{M/B}(N/M) levels, each costs N/B
levels = math.log(N/M, M/B)
return N / B * levels
import math
N, B, M = 1_000_000, 64, 8192
print(f"Binary mergesort transfers: {mergesort_transfers(N,B,M):.0f}")
opt = N/B * math.log(N/B, M/B)
print(f"Optimal: {opt:.0f}")
print(f"Ratio: {mergesort_transfers(N,B,M)/opt:.2f}")// Cache-oblivious matrix multiply — O(N^3 / (B*sqrt(M)))
#include <vector>
#include <algorithm>
using namespace std;
using Matrix = vector<vector<double>>;
// In-place C += A * B (for submatrices specified by offsets)
void co_matmul(const Matrix& A, const Matrix& B, Matrix& C,
int ra, int ca, int rb, int cb, int rc, int cc, int n) {
if (n == 1) {
C[rc][cc] += A[ra][ca] * B[rb][cb];
return;
}
int h = n / 2;
// 8 recursive calls — recursion achieves cache efficiency
// When submatrices fit in cache, stops incurring misses
co_matmul(A,B,C, ra, ca, rb, cb, rc, cc, h); // C11 += A11*B11
co_matmul(A,B,C, ra, ca+h, rb+h, cb, rc, cc, h); // C11 += A12*B21
co_matmul(A,B,C, ra, ca, rb, cb+h, rc, cc+h, h); // C12 += A11*B12
co_matmul(A,B,C, ra, ca+h, rb+h, cb+h, rc, cc+h, h); // C12 += A12*B22
co_matmul(A,B,C, ra+h, ca, rb, cb, rc+h, cc, h); // C21 += A21*B11
co_matmul(A,B,C, ra+h, ca+h, rb+h, cb, rc+h, cc, h); // C21 += A22*B21
co_matmul(A,B,C, ra+h, ca, rb, cb+h, rc+h, cc+h, h); // C22 += A21*B12
co_matmul(A,B,C, ra+h, ca+h, rb+h, cb+h, rc+h, cc+h, h); // C22 += A22*B22
}
// MT(N) = 8*MT(N/2) + O(N^2/B)
// Base: MT(sqrt(M)) = O(M/B)
// Total: O(N^3 / (B*sqrt(M))) — optimal cache-oblivious boundimport hmac, hashlib, json, time
# ── HMAC FOR TRADE MESSAGE AUTHENTICATION ────────────────────────────
def sign_order(order_dict, secret_key: bytes) -> str:
"""
HMAC-SHA256 authentication of a trade order message.
Used in XTS API, FIX protocol, and most broker APIs.
Properties:
- Correctness: same key always produces same MAC
- Unforgeability: without secret_key, cannot produce valid MAC
- Unlike plain hash: adversary who knows MAC(k, m1) cannot
compute MAC(k, m2) for any new m2 (security from secret key)
"""
message = json.dumps(order_dict, sort_keys=True).encode()
mac = hmac.new(secret_key, message, hashlib.sha256).hexdigest()
return mac
def verify_order(order_dict, received_mac: str, secret_key: bytes) -> bool:
"""
Verify HMAC. Use hmac.compare_digest to prevent timing attacks.
Timing attack: if we return early on first mismatch, an attacker
can deduce the correct MAC one byte at a time by measuring response time.
"""
expected = sign_order(order_dict, secret_key)
return hmac.compare_digest(expected, received_mac) # constant-time compare!
# Example: XTS-style order authentication
secret_key = b'ttt-secret-2025-xts-api-key'
order = {
'order_id': 'TTT-20250327-001',
'instrument': 'NIFTY25MAR24000CE',
'quantity': 50,
'price': 245.50,
'side': 'BUY',
'timestamp': int(time.time()),
}
mac = sign_order(order, secret_key)
print(f"Order MAC (first 16 chars): {mac[:16]}...")
# Verify legitimate order
print(f"Legitimate order valid: {verify_order(order, mac, secret_key)}")
# Tampered order (e.g., price changed by attacker)
tampered_order = dict(order, price=1.0) # attacker changes price
print(f"Tampered order valid: {verify_order(tampered_order, mac, secret_key)}")
# False — HMAC detects tampering
# ── COMMITMENT SCHEME ─────────────────────────────────────────────────
import os
def commit(value: str) -> tuple:
"""
Cryptographic commitment: hide value v until reveal.
Used in: sealed-bid auctions, fair coin flips, zero-knowledge proofs.
commit(v) = (commitment=h(r||v), randomness=r) — HIDING: h(r||v) leaks nothing
reveal(v, r): check commitment = h(r||v) — BINDING: can't find v' with h(r||v')=h(r||v)
"""
r = os.urandom(32) # 256 bits of randomness (prevents brute force)
commitment = hashlib.sha256(r + value.encode()).hexdigest()
return commitment, r
def reveal(value: str, randomness: bytes, commitment: str) -> bool:
"""Verify that commitment was to value."""
expected = hashlib.sha256(randomness + value.encode()).hexdigest()
return hmac.compare_digest(expected, commitment)
# TTT use case: sealed-bid order submission
bid_price = "24500"
c, r = commit(bid_price)
print(f"Committed to bid. Commitment: {c[:16]}...")
# ... later, when all bids are submitted ...
print(f"Reveal valid: {reveal(bid_price, r, c)}") # True
print(f"Cheat attempt: {reveal('23000', r, c)}") # False// HMAC-SHA256 for order authentication (conceptual)
// Production: use OpenSSL HMAC_SHA256 or libsodium crypto_auth
#include <string>
#include <vector>
#include <openssl/hmac.h> // link with -lssl -lcrypto
string hmac_sha256(const string& key, const string& msg) {
unsigned char result[32];
unsigned int len = 32;
HMAC(EVP_sha256(),
key.data(), key.size(),
(unsigned char*)msg.data(), msg.size(),
result, &len);
string hex;
char buf[3];
for (int i=0;i<32;i++) { sprintf(buf,"%02x",result[i]); hex+=buf; }
return hex;
}
// Constant-time comparison (prevent timing attacks)
bool secure_compare(const string& a, const string& b) {
if (a.size() != b.size()) return false;
unsigned char diff = 0;
for (size_t i=0;i<a.size();i++) diff |= a[i]^b[i];
return diff == 0;
}Pashupati’s options analytics system computes implied vol surface factors daily: $V = A \cdot B$ where $A$ is a $512 \times 512$ matrix of factor loadings and $B$ is a $512 \times 512$ matrix of factor returns. Compare naïve vs cache-oblivious performance.
import math
N, B, M = 512, 64, 4096 # matrix size, block size (words), cache (words)
# Naïve row × column: column of Y has N=512 elements, stride N apart
# Each column element is a cache miss → O(N^2) transfers just for one output row
naive_transfers = N**3 / B # assuming perfect row access, bad column access
print(f"Naïve (best case): {naive_transfers:,.0f} transfers")
# Cache-oblivious recursive: MT(N) = O(N^3 / (B * sqrt(M)))
co_transfers = N**3 / (B * math.sqrt(M))
print(f"Cache-oblivious: {co_transfers:,.0f} transfers")
print(f"Speedup factor: {naive_transfers / co_transfers:.1f}×")
# sqrt(M) = sqrt(4096) = 64: speedup = 64x for this config
# For N=1024, M=16384, B=64: speedup = sqrt(16384/64) = 16x
# Memory transfer lower bound: Omega(N^3 / (B*sqrt(M)))
lb = N**3 / (B * math.sqrt(M))
print(f"Lower bound: {lb:,.0f} transfers")
print(f"CO achieves optimal: {abs(co_transfers - lb) < 1}") # True
Cache-oblivious algorithms in finance: large covariance matrix operations (factor models, risk attribution) are memory-bandwidth-bound. NumPy/BLAS uses blocked matrix multiplication internally — essentially the cache-oblivious algorithm above with explicit block sizes tuned for the target CPU. For custom operations (e.g., Heston calibration with $500\times500$ implied vol surfaces), cache-oblivious divide-and-conquer implementations outperform naïve loops by $10\text{–}100\times$ on modern CPUs due to cache efficiency.
Why can a birthday attack find a collision in a $d$-bit hash function in $O(2^{d/2})$ operations, not $O(2^d)$?
Preimage resistance (find $x$ with $h(x) = y$ for a specific $y$) requires $O(2^d)$ random tries. Collision resistance (find any $x \neq x'$ with $h(x) = h(x')$) only requires $O(2^{d/2})$ random inputs. Why? For $k$ random inputs, there are $\binom{k}{2} \approx k^2/2$ pairs. Each pair collides with probability $1/2^d$. Expected collisions $\approx k^2/(2 \cdot 2^d)$. Setting this to 1: $k \approx 2^{d/2}$. This is the birthday paradox: with $2^{d/2}$ people, there’s a ~50% chance two share a birthday (with $2^d$ possible birthdays). SHA-256 targets $2^{128}$ collision resistance (not $2^{256}$) precisely because of this.
RSA decryption computes $c^d \bmod N$ where $c = m^e \bmod N$ and $ed \equiv 1 \pmod{(p-1)(q-1)}$. Why does $m^{ed} \equiv m \pmod N$?
Since $ed \equiv 1 \pmod{(p-1)(q-1)}$, write $ed = 1 + k(p-1)(q-1)$. By Fermat’s little theorem for prime $p$: $m^{p-1} \equiv 1 \pmod p$ (when $\gcd(m,p)=1$). So $m^{ed} = m^{1+k(p-1)(q-1)} = m \cdot (m^{p-1})^{k(q-1)} \equiv m \cdot 1 \equiv m \pmod p$. Similarly $m^{ed} \equiv m \pmod q$. When $\gcd(m,p)=p$ (i.e., $p \mid m$): both sides are $0 \bmod p$. By CRT (since $p,q$ coprime): $m^{ed} \equiv m \pmod{N=pq}$. This is why RSA uses two primes: to apply Fermat’s theorem to each prime factor of $N$.
The cache-oblivious matrix multiplication achieves $O(N^3 / (B\sqrt{M}))$ transfers. Why does the base case $MT(O(\sqrt{M})) = O(M/B)$ give this bound, and why is this optimal?
The recursion bottoms out when three $N\times N$ matrices fit in cache: $3N^2 \leq M \Rightarrow N = O(\sqrt{M/3}) = O(\sqrt{M})$. At that point, load all three matrices once: $3N^2/B = O(M/B)$ transfers. Then all $N^3$ multiplications are done from cache (0 additional transfers).
Number of leaf sub-problems: the recursion tree has depth $\log_2(N/\sqrt{M})$ and branching factor 8, giving $(N/\sqrt{M})^{\log_2 8} = (N/\sqrt{M})^3$ leaves. Total: $(N/\sqrt{M})^3 \cdot O(M/B) = O(N^3 M^{-3/2} M B^{-1}) = O(N^3 / (B\sqrt{M}))$.
Optimality: by an information-theoretic argument, computing $N^3$ multiply-adds requires reading $\Omega(N^3)$ memory words; with block size $B$ and reuse from a cache of size $M$, you need $\Omega(N^3/(B\sqrt{M}))$ transfers. This matches.
Why is NP-completeness an unreliable basis for cryptography, even though it guarantees hardness in some sense?
NP-completeness says: there exist instances of the problem that are hard (no polynomial algorithm solves all of them, assuming P $\neq$ NP). But cryptography encrypts with a random key $k$ drawn from some distribution. The adversary tries to break a randomly generated instance. For cryptography to be secure, we need the problem to be hard for random instances, not just worst-case instances.
Example: 3-colorability is NP-complete, but a random $G(n, p)$ with $p \gg \log n/n$ is almost certainly not 3-colorable — and is easy to verify (no 3-coloring exists). Using 3-colorability for encryption would produce instances the adversary can immediately reject.
The Merkle-Hellman knapsack failed for exactly this reason: the knapsack instances it generates have low density (a special structure), and Lagarias-Odlyzko showed low-density knapsacks are easy. Modern cryptography bases security on problems believed to be average-case hard: discrete log, factoring, lattice problems.
Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 careful argument.