Fast Fourier Transform
Multiplying two degree-$n$ polynomials naively costs $O(n^2)$. FFT does it in $O(n\log n)$ by transforming coefficient vectors into sample vectors, multiplying pointwise, then inverting. The trick: choose sample points as the $n$-th roots of unity so the Vandermonde matrix becomes recursive and collapsing.
The Problem: Polynomial Multiplication
Given polynomials $A(x) = \sum_{k=0}^{n-1} a_k x^k$ and $B(x) = \sum_{k=0}^{n-1} b_k x^k$, compute $C(x) = A(x) \cdot B(x)$. The product has degree $2n-2$ with coefficients:
This is exactly a convolution of the coefficient vectors. Naively: $O(n^2)$. FFT: $O(n\log n)$.
Three Polynomial Representations
The key insight is that different representations suit different operations:
| Operation | Coefficients | Roots + scale | Samples $(x_i, y_i)$ |
|---|---|---|---|
| Evaluation | $O(n)$ Horner | $O(n)$ | $O(n^2)$ interpolation |
| Addition | $O(n)$ | $\infty$ (impossible) | $O(n)$ |
| Multiplication | $O(n^2)$ โ bottleneck | $O(n)$ concatenate | $O(n)$ pointwise |
The Vandermonde Matrix and the DFT
Evaluating a degree-$(n-1)$ polynomial at $n$ points $x_0, x_1, \ldots, x_{n-1}$ simultaneously is a matrix-vector product:
General $V$ costs $O(n^2)$ to apply. The Discrete Fourier Transform (DFT) specialises to $x_k = e^{i\tau k/n}$ where $\tau = 2\pi$ โ the $n$-th roots of unity โ making $V$ computable in $O(n\log n)$.
Roots of Unity โ The Collapsing Property
The $n$-th roots of unity are $\omega_n^k = e^{2\pi i k/n}$ for $k = 0, 1, \ldots, n-1$. They are evenly spaced on the unit circle. The critical property that makes FFT work:
The FFT Algorithm
Split the polynomial $A(x)$ into even and odd coefficients:
Then $A(x) = A_{\text{even}}(x^2) + x \cdot A_{\text{odd}}(x^2)$. If we evaluate at the $n$-th roots of unity, the $x^2$ terms need only the $(n/2)$-th roots of unity โ the same subproblem at half size! Recurrence: $T(n) = 2T(n/2) + O(n) = O(n\log n)$.
import cmath, math
def fft(a, invert=False):
"""
Cooley-Tukey FFT. a: coefficient list, length must be a power of 2.
invert=False: DFT (coefficients -> samples at roots of unity)
invert=True: IDFT (samples -> coefficients)
Returns list of complex numbers.
O(n log n) time.
"""
n = len(a)
if n == 1:
return list(a) # base case
# Split even and odd coefficients
a_even = fft(a[0::2], invert) # a[0], a[2], a[4], ...
a_odd = fft(a[1::2], invert) # a[1], a[3], a[5], ...
# Primitive n-th root of unity
angle = 2 * math.pi / n * (-1 if invert else 1)
w = complex(1)
wn = cmath.exp(complex(0, angle)) # e^(i * 2pi/n) or conjugate
result = [0] * n
for k in range(n // 2):
# Butterfly operation: the core of FFT
result[k] = a_even[k] + w * a_odd[k]
result[k + n//2] = a_even[k] - w * a_odd[k]
w *= wn # advance to next root: w^(k+1)
if invert:
result = [x / n for x in result] # IDFT: divide by n at end
return result
def poly_multiply(a, b):
"""
Multiply polynomials a and b using FFT. O(n log n).
a, b: coefficient lists (constant term first).
Returns coefficient list of product.
"""
result_len = len(a) + len(b) - 1
# Pad to next power of 2
n = 1
while n < result_len:
n <<= 1
fa = fft(a + [0] * (n - len(a))) # FFT of a (zero-padded)
fb = fft(b + [0] * (n - len(b))) # FFT of b (zero-padded)
# Pointwise multiply in sample representation: O(n)
fc = [fa[k] * fb[k] for k in range(n)]
# IFFT to get product coefficients
c = fft(fc, invert=True)
return [round(x.real) for x in c[:result_len]]
# Example: (1 + 2x + 3x^2) * (4 + 5x) = 4 + 13x + 22x^2 + 15x^3
print(poly_multiply([1, 2, 3], [4, 5])) # [4, 13, 22, 15]#include <vector>
#include <complex>
#include <cmath>
#include <algorithm>
using namespace std;
using cd = complex<double>;
const double PI = acos(-1);
// In-place Cooley-Tukey FFT โ O(n log n)
// invert=false: DFT, invert=true: IDFT
void fft(vector<cd>& a, bool invert) {
int n = a.size();
// Bit-reversal permutation
for (int i=1, j=0; i<n; i++) {
int bit = n >> 1;
for (; j&bit; bit>>=1) j ^= bit;
j ^= bit;
if (i < j) swap(a[i], a[j]);
}
// Butterfly passes
for (int len=2; len<=n; len<<=1) {
double ang = 2*PI/len * (invert ? -1 : 1);
cd wlen(cos(ang), sin(ang));
for (int i=0; i<n; i+=len) {
cd w(1);
for (int j=0; j<len/2; j++) {
cd u = a[i+j], v = a[i+j+len/2]*w;
a[i+j] = u + v; // butterfly
a[i+j+len/2] = u - v;
w *= wlen;
}
}
}
if (invert)
for (auto& x : a) x /= n;
}
// Polynomial multiplication โ O(n log n)
vector<long long> poly_multiply(
vector<long long> a, vector<long long> b) {
vector<cd> fa(a.begin(), a.end()),
fb(b.begin(), b.end());
int result_len = a.size() + b.size() - 1;
int n = 1;
while (n < result_len) n <<= 1;
fa.resize(n); fb.resize(n);
fft(fa, false); fft(fb, false);
for (int i=0; i<n; i++) fa[i] *= fb[i];
fft(fa, true);
vector<long long> res(result_len);
for (int i=0; i<result_len; i++)
res[i] = llround(fa[i].real());
return res;
}The Inverse DFT โ Why $V^{-1} = \frac{1}{n}\bar{V}$
To go from samples back to coefficients we need $V^{-1}$. It turns out $V^{-1}$ has the same structure as $V$ with the roots replaced by their complex conjugates:
Proof: compute $(V \cdot \bar{V})_{jk} = \sum_{m=0}^{n-1} e^{ij\tau m/n} e^{-ik\tau m/n} = \sum_{m=0}^{n-1} e^{i(j-k)\tau m/n}$. If $j=k$: each term is 1, sum $= n$. If $j \neq k$: geometric series with ratio $r = e^{i(j-k)\tau/n} \neq 1$, so sum $= \frac{r^n - 1}{r - 1} = \frac{e^{i(j-k)\tau}-1}{r-1} = \frac{1-1}{r-1} = 0$ (since $e^{i\tau}=1$). So $V\bar{V} = nI$, hence $V^{-1} = \frac{1}{n}\bar{V}$. $\square$
Consequence: IDFT is just FFT with conjugated roots, divided by $n$ at the end. Same algorithm, same $O(n\log n)$.
Complete FFT-based Polynomial Multiplication Pipeline
import cmath, math, time
def dft_naive(a):
"""Naive O(n^2) DFT for comparison."""
n = len(a)
result = []
for k in range(n):
val = sum(a[j] * cmath.exp(-2j * math.pi * k * j / n)
for j in range(n))
result.append(val)
return result
def next_pow2(n):
p = 1
while p < n: p <<= 1
return p
def benchmark_fft_vs_naive(sizes=[32, 64, 128, 256, 512]):
"""
Compare DFT O(n^2) vs FFT O(n log n) empirically.
Shows speedup grows rapidly with n.
"""
import random
results = []
for n in sizes:
a = [random.random() for _ in range(n)]
t0 = time.perf_counter()
dft_naive(a)
t_naive = time.perf_counter() - t0
t0 = time.perf_counter()
fft(a + [0] * (next_pow2(n) - n))
t_fft = time.perf_counter() - t0
results.append((n, t_naive, t_fft, t_naive/t_fft))
print(f"n={n:4d}: naive={t_naive*1000:.2f}ms "
f"fft={t_fft*1000:.2f}ms "
f"speedup={t_naive/t_fft:.1f}x")
return results
# Key identity: for n-th roots of unity omega_n^k = e^(2*pi*i*k/n)
# FFT evaluates A at {omega_n^0, omega_n^1, ..., omega_n^(n-1)}
# IFFT recovers A from those samples: multiply by conjugate roots, divide by n
def demo_round_trip():
"""Verify FFT followed by IFFT recovers original coefficients."""
original = [1, 2, 3, 4, 5, 6, 7, 8] # must be power-of-2 length
samples = fft(original) # DFT: coefficients -> samples
recovered = fft(samples, invert=True) # IDFT: samples -> coefficients
print([round(x.real) for x in recovered]) # [1, 2, 3, 4, 5, 6, 7, 8]// DFT naive O(n^2) for comparison
#include <vector>
#include <complex>
#include <cmath>
using namespace std;
using cd = complex<double>;
const double PI = acos(-1.0);
vector<cd> dft_naive(const vector<double>& a) {
int n = a.size();
vector<cd> result(n);
for (int k = 0; k < n; ++k)
for (int j = 0; j < n; ++j)
result[k] += a[j] * exp(cd(0, -2.0*PI*k*j/n));
return result; // O(n^2)
}
// Key identity verification: FFT -> IFFT recovers original
// fft(a, false) then fft(result, true) and divide by n
void verify_round_trip() {
vector<cd> a = {1,2,3,4,5,6,7,8}; // must be power of 2
fft(a, false); // DFT: coefficients -> samples
fft(a, true); // IDFT: samples -> coefficients (divided by n)
// a[i].real() should now be 1,2,3,4,5,6,7,8 again
}
// Convolution as FFT: the key claim
// conv(a, b)[k] = sum_{j=0}^{k} a[j]*b[k-j]
// = IFFT(FFT(a) * FFT(b))[k] (pointwise product)
// This is why polynomial multiplication = convolution = FFT-based.Applications of FFT
The transform $A \mapsto A^* = \text{FFT}(A)$ produces complex numbers $a_k^*$ where $|a_k^*|$ is the amplitude of frequency $k$ and $\arg(a_k^*)$ is its phase. This frequency-domain view powers a vast range of applications:
import numpy as np
# โโ FAST CONVOLUTION (polynomial multiplication via NumPy FFT) โโโโโโโ
def fast_convolve(a, b):
"""
Convolve sequences a and b using FFT. O(n log n).
Equivalent to np.convolve(a, b) but faster for large inputs.
"""
n = len(a) + len(b) - 1
# Pad to next power of 2 for efficiency
N = 1 << (n - 1).bit_length()
fa = np.fft.rfft(a, N) # real FFT (2x faster for real input)
fb = np.fft.rfft(b, N)
fc = fa * fb # pointwise multiply: O(N)
c = np.fft.irfft(fc, N) # IFFT
return np.round(c[:n]).astype(int)
# โโ LOW-PASS FILTER (zero out high frequencies) โโโโโโโโโโโโโโโโโโโโโ
def low_pass_filter(signal, cutoff_fraction=0.1):
"""
Keep only lowest cutoff_fraction of frequencies.
Used in: MP3 compression, smoothing, noise reduction.
"""
n = len(signal)
freq = np.fft.rfft(signal)
cutoff = int(len(freq) * cutoff_fraction)
freq[cutoff:] = 0 # zero out high frequencies
return np.fft.irfft(freq, n)
# โโ LARGE INTEGER MULTIPLICATION โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
def large_int_multiply(x, y):
"""
Multiply large integers x, y by treating their digits as polynomials.
poly_multiply([d0, d1, ...], [e0, e1, ...]) then handle carries.
O(n log n) for n-digit numbers โ beats naive O(n^2).
"""
digits_x = [int(d) for d in str(x)][::-1] # LSB first
digits_y = [int(d) for d in str(y)][::-1]
product_digits = fast_convolve(digits_x, digits_y)
# Handle carries
result, carry = [], 0
for d in product_digits:
total = d + carry
result.append(total % 10)
carry = total // 10
while carry:
result.append(carry % 10)
carry //= 10
return int(''.join(str(d) for d in result[::-1]))
# Example: 12345 * 67890
print(large_int_multiply(12345, 67890)) # 838102050// FFT-based applications
#include <vector>
#include <complex>
#include <cmath>
#include <algorithm>
using namespace std;
using cd = complex<double>;
// Low-pass filter: zero out high-frequency components
// signal: time-domain samples, cutoff_k: keep frequencies 0..cutoff_k-1
vector<double> low_pass(vector<double> signal, int cutoff_k) {
int n = signal.size();
vector<cd> freq(signal.begin(), signal.end());
fft(freq, false); // time -> frequency domain
for (int k = cutoff_k; k < n-cutoff_k; ++k)
freq[k] = 0; // zero high frequencies
fft(freq, true); // frequency -> time domain
vector<double> result(n);
for (int i = 0; i < n; ++i) result[i] = freq[i].real();
return result;
}
// Large integer multiply: digits as polynomial coefficients
// For x = sum(d_i * 10^i), treat as polynomial, multiply via FFT
vector<int> large_int_mul(vector<int> dx, vector<int> dy) {
// dx[i] = i-th digit of x (LSB first)
auto c = poly_multiply(
vector<long long>(dx.begin(), dx.end()),
vector<long long>(dy.begin(), dy.end()));
// Propagate carries
for (int i = 0; i < (int)c.size()-1; ++i) {
c[i+1] += c[i] / 10;
c[i] %= 10;
}
while (c.back() >= 10) {
c.push_back(c.back() / 10);
c[c.size()-2] %= 10;
}
return vector<int>(c.begin(), c.end());
}NIFTY implied volatility surface has a noisy term structure โ tick-by-tick IV estimates are noisy. A low-pass FFT filter smooths the term structure by removing high-frequency noise while preserving the overall shape.
import numpy as np
# Raw IV estimates for 16 weekly expiries (noisy)
iv_raw = np.array([0.18, 0.21, 0.19, 0.23, 0.20, 0.22, 0.21, 0.24,
0.23, 0.25, 0.22, 0.26, 0.24, 0.27, 0.25, 0.28])
# FFT: time domain (expiry) -> frequency domain
freq = np.fft.rfft(iv_raw)
# Keep only lowest 3 frequencies (smooth trend), zero the rest
cutoff = 3
freq_filtered = freq.copy()
freq_filtered[cutoff:] = 0
# IFFT: recover smoothed IV term structure
iv_smooth = np.fft.irfft(freq_filtered, len(iv_raw))
print(np.round(iv_smooth, 4))
# [0.1977, 0.2080, 0.2133, 0.2134, 0.2086, 0.2099, 0.2180, 0.2340,
# 0.2465, 0.2490, 0.2417, 0.2320, 0.2310, 0.2415, 0.2567, 0.2654]
numpy.fft is the production FFT: np.fft.rfft exploits real-valued inputs for a 2ร speedup over the general complex FFT. For two $n$-point real signals, use rfft (output has $n/2+1$ complex values), multiply pointwise, then irfft. The underlying implementation is FFTW (Fastest Fourier Transform in the West) — originally from Frigo and Johnson at MIT, now used everywhere from NumPy to MATLAB to signal processing chips. For options analytics at TTT, FFT-based convolution underlies: IV smoothing, autocorrelation of return series, spectral analysis of strategy PnL, and efficient density recovery from characteristic functions (Gil-Pelaez inversion — the same $O(n\log n)$ trick used in Carr-Madan options pricing).
You need to add two polynomials and then multiply the result by a third. Which sequence of representations minimises total work?
Addition is cheapest in coefficient form ($O(n)$, just add coefficient-by-coefficient). After adding $A+B$, we need to multiply by $C$ โ switch to sample form (one FFT: $O(n\log n)$), multiply pointwise ($O(n)$), IFFT back ($O(n\log n)$). Total: $O(n\log n)$.
Option C is slightly worse: converting all three to samples requires 3 FFTs instead of 1 (addition in sample form is the same $O(n)$ as coefficients, so no benefit). Option D fails because addition in root representation is impossible ($\infty$ in the table). Option A costs $O(n^2)$ for the multiplication.
The 8th roots of unity are $\omega_8^k = e^{2\pi i k/8}$ for $k=0,\ldots,7$. When we square them, how many distinct values do we get, and what are they?
$(\omega_8^k)^2 = e^{2\pi i k/4} = \omega_4^k$. The pairs $(k, k+4)$ give the same square: $\omega_8^0 = \omega_8^4 = 1$ when squared โ $1$; $\omega_8^1 = \omega_8^5$ when squared โ $i = \omega_4^1$; $\omega_8^2 = \omega_8^6$ โ $-1 = \omega_4^2$; $\omega_8^3 = \omega_8^7$ โ $-i = \omega_4^3$. So $\{1, i, -1, -i\}$ โ exactly the 4th roots of unity. This halving is the collapsing property that lets FFT recurse efficiently: $n$ roots become $n/2$ roots, $\log n$ times.
The IDFT is computed using the same FFT algorithm but with conjugated roots and a division by $n$ at the end. Why can we reuse the FFT algorithm rather than designing a separate IDFT algorithm?
We proved $V^{-1} = \frac{1}{n}\bar{V}$ where $\bar{V}_{jk} = e^{-2\pi ijk/n}$ (conjugate roots). The IDFT is then $A = \frac{1}{n}\bar{V} \cdot A^*$. The FFT algorithm evaluates a polynomial at $n$-th roots of unity using even/odd splitting. Replacing $e^{2\pi i/n}$ with $e^{-2\pi i/n}$ (the conjugate) changes only the sign of the angle โ the same butterfly structure applies. So IFFT = FFT with `angle = -2ฯ/n` instead of `+2ฯ/n`, then divide by $n$. One algorithm, two directions.
You want to compute the autocorrelation of a return series $r = [r_0, \ldots, r_{n-1}]$: $\text{corr}[k] = \sum_{j} r_j \cdot r_{j+k}$. How many FFTs are needed?
The Wiener-Khinchin theorem: the autocorrelation of $r$ equals the inverse FFT of $|R(\omega)|^2$ where $R = \text{FFT}(r)$. So: (1) compute $R = \text{FFT}(r)$, (2) compute $|R_k|^2 = R_k \cdot \overline{R_k}$ pointwise in $O(n)$, (3) compute $\text{IFFT}(|R|^2)$. Total: 2 FFTs + $O(n)$ work = $O(n\log n)$ vs $O(n^2)$ naรฏve.
This is the standard technique in quantitative finance for computing autocorrelation functions of return series, testing for serial dependence, and detecting market microstructure effects โ all running in $O(n\log n)$ for thousands of ticks.
Work through independently. Q1–3 direct application. Q4–7 synthesis. Q8–10 careful argument.
np.fft.rfft. Which lags show significant autocorrelation? Medium