Sequences
Before you can sort, search, or traverse a graph, you need a place to put things. This chapter is about that place — and why the choice of container matters more than most programmers realise.
Interface vs Implementation — The Most Important Distinction
Every data structure course conflates two things that must be kept separate. An interface is a specification — it says what operations are supported and what they must do. An implementation is a representation — it says how those operations are actually carried out in memory.
The interface is the problem. The implementation is the solution. Multiple implementations can satisfy the same interface with wildly different performance. Choosing the wrong implementation for your workload is one of the most common and most expensive mistakes in systems programming.
Sequence interface — maintains items in an extrinsic order. The order comes from outside: item $x_0$ is first because you put it first, not because of any property of $x_0$ itself. Operations are position-based: get the $i$-th item, insert at position $i$.
Set interface — maintains items in an intrinsic order. The order comes from the items themselves, via a key. Operations are key-based: find the item with key $k$, find the minimum key.
This chapter covers three implementations of the Sequence interface: static arrays, linked lists, and dynamic arrays. The Set interface begins in Chapter 3.
The Sequence Interface
A sequence stores $n$ items $(x_0, x_1, \ldots, x_{n-1})$ in zero-indexed order. The full interface has four categories of operations:
| Container | |
|---|---|
| build(X) | Build sequence from iterable X in $\Theta(n)$ |
| len() | Return number of items stored |
| Static | |
| iter_seq() | Return all items one-by-one in sequence order |
| get_at(i) | Return the $i$-th item |
| set_at(i, x) | Replace the $i$-th item with $x$ |
| Dynamic | |
| insert_at(i, x) | Add $x$ as the $i$-th item; shift everything after it right |
| delete_at(i) | Remove and return the $i$-th item; shift everything after it left |
| insert_first(x) | Add $x$ as the first item |
| delete_first() | Remove and return the first item |
| insert_last(x) | Add $x$ as the last item |
| delete_last() | Remove and return the last item |
insert_last and delete_last — last-in, first-out. A queue uses insert_last and delete_first — first-in, first-out. Both are just restricted views of the full Sequence interface.
Implementation 1 — Static Array
Computer memory is, at the hardware level, a large fixed-length array: a sequence of addressed words that the CPU can read or write in $\Theta(1)$ time given an address. A static array maps directly onto this — it claims a contiguous block of memory addresses, and item $i$ lives at address $\text{base} + i \cdot w$ where $w$ is the word size.
This gives get_at(i) and set_at(i, x) in $\Theta(1)$ — just compute the address and read or write. No search required. This is the random access property that the Word-RAM model guarantees.
The cost arrives with dynamic operations. The array occupies a fixed chunk of memory. Inserting at position $i$ requires shifting every item from $i$ to $n-1$ one slot to the right — $\Theta(n)$ writes. Deleting at position $i$ requires the reverse. And if you need more space than you initially allocated, you must allocate a new block, copy everything over — again $\Theta(n)$ — and free the old block.
class Array_Seq:
def __init__(self): # O(1)
self.A = []
self.size = 0
def __len__(self): # O(1)
return self.size
def __iter__(self): # O(n)
yield from self.A
def build(self, X): # O(n)
self.A = [a for a in X]
self.size = len(self.A)
def get_at(self, i): # O(1) ← random access
return self.A[i]
def set_at(self, i, x): # O(1) ← random access
self.A[i] = x
def insert_at(self, i, x): # O(n) ← must shift everything after i
n = len(self)
A = [None] * (n + 1)
# copy items before i unchanged
for k in range(i): A[k] = self.A[k]
A[i] = x # place new item
# copy items from i onward shifted right by 1
for k in range(i, n): A[k + 1] = self.A[k]
self.build(A)
def delete_at(self, i): # O(n) ← must shift everything after i
n = len(self)
A = [None] * (n - 1)
for k in range(i): A[k] = self.A[k]
x = self.A[i]
for k in range(i, n - 1): A[k] = self.A[k + 1]
self.build(A)
return x
# convenience wrappers — all O(n) via insert_at/delete_at
def insert_first(self, x): self.insert_at(0, x)
def delete_first(self): return self.delete_at(0)
def insert_last(self, x): self.insert_at(len(self), x)
def delete_last(self): return self.delete_at(len(self) - 1)
#include <vector>
#include <stdexcept>
using namespace std;
// Static array sequence — wraps std::vector but exposes
// the same interface with explicit complexity annotations
template<typename T>
class Array_Seq {
vector<T> A;
public:
Array_Seq() = default; // O(1)
size_t len() const { return A.size(); } // O(1)
void build(const vector<T>& X) { A = X; } // O(n)
T get_at(int i) const { return A.at(i); } // O(1) random access
void set_at(int i, T x) { A.at(i) = x; } // O(1) random access
void insert_at(int i, T x) { // O(n) — shifts suffix
A.insert(A.begin() + i, x);
}
T delete_at(int i) { // O(n) — shifts suffix
T x = A.at(i);
A.erase(A.begin() + i);
return x;
}
void insert_first(T x) { insert_at(0, x); }
T delete_first() { return delete_at(0); }
void insert_last(T x) { A.push_back(x); } // O(n) worst case here
T delete_last() { T x = A.back(); A.pop_back(); return x; }
};
Implementation 2 — Linked List
A linked list abandons contiguous memory entirely. Each item lives in a node — a small object with two fields: node.item (the actual value) and node.next (the memory address of the next node). The list itself just stores a pointer to the first node, called the head.
Inserting or deleting at the front is now $\Theta(1)$ — just relink a pointer. No copying, no shifting. But get_at(i) is $\Theta(i)$: you must walk the chain node by node from the head to reach position $i$. There is no shortcut.
class Node:
def __init__(self, x): # O(1)
self.item = x
self.next = None # address of next node, or None if last
def later_node(self, i): # O(i) — walk i steps forward
if i == 0: return self
return self.next.later_node(i - 1)
class Linked_List_Seq:
def __init__(self):
self.head = None # pointer to first node
self.size = 0
def __len__(self): return self.size
def __iter__(self): # O(n)
node = self.head
while node:
yield node.item
node = node.next
def build(self, X): # O(n)
for a in reversed(X):
self.insert_first(a)
def get_at(self, i): # O(i) — must walk from head
return self.head.later_node(i).item
def set_at(self, i, x): # O(i)
self.head.later_node(i).item = x
def insert_first(self, x): # O(1) ← just relink head
new_node = Node(x)
new_node.next = self.head
self.head = new_node
self.size += 1
def delete_first(self): # O(1) ← just relink head
x = self.head.item
self.head = self.head.next
self.size -= 1
return x
def insert_at(self, i, x): # O(i)
if i == 0:
self.insert_first(x)
return
new_node = Node(x)
node = self.head.later_node(i - 1)
new_node.next = node.next
node.next = new_node
self.size += 1
def delete_at(self, i): # O(i)
if i == 0: return self.delete_first()
node = self.head.later_node(i - 1)
x = node.next.item
node.next = node.next.next
self.size -= 1
return x
# O(n) — must walk to end
def insert_last(self, x): self.insert_at(len(self), x)
def delete_last(self): return self.delete_at(len(self) - 1)
#include <memory>
#include <stdexcept>
using namespace std;
template<typename T>
struct Node {
T item;
shared_ptr<Node<T>> next; // pointer to next node
Node(T x) : item(x), next(nullptr) {}
Node<T>* later_node(int i) { // O(i) — walk i steps
if (i == 0) return this;
if (!next) throw out_of_range("index out of range");
return next->later_node(i - 1);
}
};
template<typename T>
class Linked_List_Seq {
shared_ptr<Node<T>> head;
int size = 0;
public:
int len() const { return size; }
void insert_first(T x) { // O(1) ← just relink head
auto n = make_shared<Node<T>>(x);
n->next = head;
head = n;
size++;
}
T delete_first() { // O(1)
T x = head->item;
head = head->next;
size--;
return x;
}
T get_at(int i) { // O(i)
return head->later_node(i)->item;
}
void insert_at(int i, T x) { // O(i)
if (i == 0) { insert_first(x); return; }
auto n = make_shared<Node<T>>(x);
auto* prev = head->later_node(i - 1);
n->next = prev->next;
prev->next = n;
size++;
}
};
list is not a linked list. It is a dynamic array. This confuses almost every programmer who comes from Java or C++ where LinkedList is a distinct class. In Python, if you need O(1) front insertion, use collections.deque — which is a doubly-linked list under the hood. Using list.insert(0, x) is $\Theta(n)$ and will silently destroy the performance of any algorithm that relies on fast front operations.
Implementation 3 — Dynamic Array
The static array is fast at random access but slow at dynamic operations. The linked list is fast at front insertion but slow at random access. Can we get fast random access and fast append? Yes — with a trick.
A dynamic array is an array that over-allocates. When you insert the last item and the array is full, instead of allocating exactly one more slot, you allocate double the current capacity. Now the next $n$ insertions are free — no reallocation needed. Only when the array fills again do you pay the doubling cost, and at that point you double again.
This is the key insight of amortised analysis: a single operation can be expensive ($\Theta(n)$ for a resize), but if you look at any sequence of $k$ operations, the total cost is $\Theta(k)$. So the cost per operation, amortised over many operations, is $\Theta(1)$.
insert_last: a single insert costs $\Theta(n)$ when a resize occurs, but resizes happen at sizes $1, 2, 4, 8, \ldots$ — the total copy work across all $n$ insertions is:
$$1 + 2 + 4 + \cdots + n = 2n - 1 = \Theta(n)$$
So $n$ insertions cost $\Theta(n)$ total — $\Theta(1)$ each, amortised.
class Dynamic_Array_Seq(Array_Seq):
def __init__(self): # O(1)
super().__init__()
self.size = 0
self.upper = 0 # resize when size hits upper
self.lower = 0 # resize down when size hits lower
self._resize(0)
def __len__(self): return self.size
def __iter__(self): # O(n)
for i in range(self.size): yield self.A[i]
def build(self, X): # O(n)
for a in X: self.insert_last(a)
def _compute_bounds(self): # O(1)
cap = len(self.A)
self.upper = cap
self.lower = cap // 4 # shrink when 1/4 full
def _resize(self, n): # O(1) or O(n) — amortised O(1)
cap = len(self.A)
if self.lower < n < self.upper:
return # still within bounds, no resize needed
new_cap = max(n, 1) * 2 # double the needed space
new_A = [None] * new_cap
for i in range(self.size): new_A[i] = self.A[i]
self.A = new_A
self._compute_bounds()
def insert_last(self, x): # O(1) amortised ← the key operation
self._resize(self.size + 1)
self.A[self.size] = x
self.size += 1
def delete_last(self): # O(1) amortised
x = self.A[self.size - 1]
self.A[self.size - 1] = None # avoid memory leak
self.size -= 1
self._resize(self.size)
return x
# arbitrary-index operations still O(n) — must shift
def insert_at(self, i, x):
self.insert_last(None)
for k in range(self.size - 1, i, -1):
self.A[k] = self.A[k - 1]
self.A[i] = x
def delete_at(self, i):
x = self.A[i]
for k in range(i, self.size - 1):
self.A[k] = self.A[k + 1]
self.delete_last()
return x
def insert_first(self, x): self.insert_at(0, x) # O(n)
def delete_first(self): return self.delete_at(0) # O(n)
#include <algorithm>
using namespace std;
// std::vector IS a dynamic array — this shows the internals explicitly
template<typename T>
class Dynamic_Array_Seq {
T* A = nullptr;
int cap = 0;
int sz = 0;
void resize(int n) { // O(1) or O(n), amortised O(1)
if (n < cap / 4 || n >= cap) {
int new_cap = max(n, 1) * 2;
T* B = new T[new_cap]();
for (int i = 0; i < sz; ++i) B[i] = A[i];
delete[] A;
A = B;
cap = new_cap;
}
}
public:
Dynamic_Array_Seq() { resize(0); }
~Dynamic_Array_Seq() { delete[] A; }
int len() const { return sz; }
T get_at(int i) const { return A[i]; } // O(1)
void set_at(int i, T x) { A[i] = x; } // O(1)
void insert_last(T x) { // O(1) amortised
resize(sz + 1);
A[sz++] = x;
}
T delete_last() { // O(1) amortised
T x = A[--sz];
resize(sz);
return x;
}
void insert_at(int i, T x) { // O(n)
insert_last(T{});
for (int k = sz - 1; k > i; --k) A[k] = A[k-1];
A[i] = x;
}
T delete_at(int i) { // O(n)
T x = A[i];
for (int k = i; k < sz - 1; ++k) A[k] = A[k+1];
delete_last();
return x;
}
};
The Comparison Table — Choosing Your Sequence
All three implementations satisfy the Sequence interface. Their performance differs dramatically depending on the operation. This is the table every engineer should have memorised:
| Data Structure | build | get_at / set_at | insert/delete first | insert/delete last | insert/delete at i |
|---|---|---|---|---|---|
| Array | $\Theta(n)$ | $\Theta(1)$ | $\Theta(n)$ | $\Theta(n)$ | $\Theta(n)$ |
| Linked List | $\Theta(n)$ | $\Theta(i)$ | $\Theta(1)$ | $\Theta(n)$ | $\Theta(n)$ |
| Dynamic Array | $\Theta(n)$ | $\Theta(1)$ | $\Theta(n)$ | $\Theta(1)^{(a)}$ | $\Theta(n)$ |
(a) amortised — individual operations can be $\Theta(n)$ during resize, but any sequence of $k$ appends costs $\Theta(k)$ total.
An NSE co-location system receives a WebSocket stream of order events. You need to maintain a sliding window of the last 1,000 events for a VWAP calculation — old events fall off the back, new events arrive at the front. Which sequence implementation is appropriate, and why?
The workload is: insert_first (new event arrives) + delete_last (oldest event falls off) + iteration over all 1,000 items for VWAP. No random access by index needed.
Dynamic array: insert_first is $\Theta(n)$ — must shift 1,000 items on every tick. At 10,000 ticks/second that's $10^7$ shifts/second just to maintain the buffer. Unusable.
Linked list: insert_first is $\Theta(1)$, delete_last is $\Theta(n)$ with a singly-linked list (must walk to the second-to-last node). But a doubly-linked list with a tail pointer makes both $\Theta(1)$. This is exactly what Python's collections.deque provides.
Using deque(maxlen=1000): append left is $\Theta(1)$, the deque automatically drops from the right when full, and iteration is $\Theta(n)$. The VWAP calculation costs $\Theta(n)$ per tick — unavoidable since you must touch all 1,000 prices — but the buffer maintenance is $\Theta(1)$.
deque). The naive choice of a Python list with insert(0, event) would be $\Theta(n)$ per tick — 1,000 shifts every time a new order arrives. On a live NIFTY feed at peak, this compounds into a real lag. The deque is the correct tool and collections.deque(maxlen=1000) is two characters more to type.
Python's list — What It Actually Is
Python's built-in list is a dynamic array. It over-allocates using a growth formula from CPython's source: when a list of size $n$ needs to grow, the new allocation is roughly $n + n \gg 3 + 6$ — about $n/8$ extra slots. This is linear in $n$, so amortised $\Theta(1)$ append is guaranteed.
This has one critical practical implication: list.append(x) and list.pop() are $\Theta(1)$ amortised. But list.insert(0, x) and list.pop(0) are $\Theta(n)$ — they shift every element. Code that uses a Python list as a queue (appending at the back, removing from the front) is accidentally $\Theta(n^2)$ for $n$ operations.
list.pop(0) — every pop shifts the entire remaining list. On a busy day with 50,000 events queued, each pop(0) is 50,000 pointer moves. The fix is one import: from collections import deque. The deque.popleft() is $\Theta(1)$. This is not a micro-optimisation — on a live feed it is the difference between a system that keeps up and one that falls behind by minutes.
A trading system maintains a list of open orders. Orders are always added to the end of the list and always cancelled from a random position by order ID. Which sequence implementation minimises the worst-case cost of a cancellation?
The catch is in what "cancellation" requires. You must first find the order by ID, then delete it. Finding an item by value (not by position) requires scanning the entire sequence — $\Theta(n)$ — for both arrays and linked lists, since neither has intrinsic key-based lookup.
Even if you were given the exact node pointer in a linked list, deleting from a singly-linked list still requires walking to the predecessor — $\Theta(n)$. A doubly-linked list gives $\Theta(1)$ deletion given the node pointer, but you still need $\Theta(n)$ to find it.
The right data structure for this workload is a hash map (Chapter 4) for $\Theta(1)$ lookup by order ID, combined with a doubly-linked list for $\Theta(1)$ deletion once found. Real order management systems use exactly this combination.
A dynamic array starts empty and you perform $n = 16$ consecutive insert_last operations using table-doubling (capacity doubles when full, starting at capacity 1). How many total element copies occur across all resize operations?
Resizes happen when the array is full. With table-doubling starting at capacity 1:
— Insert 1: resize cap 1→2, copy 1 element
— Insert 2: resize cap 2→4, copy 2 elements
— Insert 4: resize cap 4→8, copy 4 elements
— Insert 8: resize cap 8→16, copy 8 elements
— Inserts 9–16: no resize needed (cap already 16)
Total copies: $1 + 2 + 4 + 8 = 15$. In general for $n$ insertions, total copies $= 1 + 2 + 4 + \cdots + n/2 = n - 1 = \Theta(n)$. Divide by $n$ insertions: $\Theta(1)$ amortised per insertion. This is the geometric series argument behind the amortised bound.
Python code processes a live option chain by appending each instrument to a list, then repeatedly calling chain.pop(0) to process instruments one-by-one. The chain has 5,000 instruments. What is the total complexity of processing all instruments this way?
Python's
list is a dynamic array. pop(0) removes the first element and shifts every remaining element one position left — $\Theta(n - k)$ work on the $k$-th pop. Total:
$$\sum_{k=0}^{n-1}(n-k) = n + (n-1) + \cdots + 1 = \frac{n(n+1)}{2} = \Theta(n^2)$$
For $n = 5{,}000$: about 12.5 million shift operations. The fix is trivial — from collections import deque and use popleft() instead, which is $\Theta(1)$, bringing total complexity to $\Theta(n)$.
A dynamic array uses a shrink threshold of 1/4: when size drops below $\text{cap}/4$, it resizes to $\text{cap}/2$. Why is the threshold set at 1/4 rather than 1/2?
Suppose we shrink at 1/2. After $n$ insertions, the array is full at capacity $n$. Now: delete → size = $n-1 < n/2$? No. Actually consider: after filling to capacity $n$, then deleting one item leaves size $n-1$, which is not below $n/2$ (assuming $n > 2$). So you'd need many deletions before shrinking.
But consider the boundary case more carefully. After a doubling from capacity $n/2$ to $n$, the array has exactly $n/2 + 1$ items. One deletion brings it to $n/2$ items — exactly at the 1/2 threshold. If we shrink to capacity $n/2$, we're now full again. One insertion → double back to $n$. One deletion → shrink to $n/2$ again. Every single insert or delete triggers an $O(n)$ resize. The amortised bound collapses to $\Theta(n)$ per operation — catastrophically worse.
The 1/4 shrink threshold ensures there is always a "buffer zone" of at least $n/4$ operations between when we last resized and when we'll resize again. This gap is what makes the amortised analysis work.
Work through these independently. Q1–3 are direct applications. Q4–7 require synthesis. Q8–10 require careful argument.
get_at(n//2) on a linked list of size $n$; (b) insert_last on a static array; (c) delete_first on a dynamic array; (d) insert_first on a doubly-linked list with a head pointer.
Easy
append exactly 17 times. Using table-doubling (starting capacity 1, doubling when full), trace through exactly which appends trigger a resize, what the new capacity is after each resize, and how many elements are copied. What is the final capacity?
Easy
list used as a stack (append/pop from the end) is efficient, but the same list used as a queue (append to end, pop from front) is inefficient. What is the asymptotic cost of $n$ enqueue+dequeue operations in each case?
Easy
Deque class (double-ended queue) supporting $\Theta(1)$ amortised insert_first, insert_last, delete_first, and delete_last using a dynamic array. Hint: maintain a circular buffer with head and tail indices.
Medium
next pointer of the last node points back to some earlier node. Given only a pointer to the head (and no knowledge of the list's size), design a $\Theta(n)$ time, $\Theta(1)$ space algorithm to detect whether a cycle exists and return the length of the cycle if one is found. Prove correctness.
Medium
find(k), insert(x), and delete(k) methods. What is the worst-case complexity of each operation? Why is this not an efficient Set implementation?
Medium
insert_last operations on a dynamic array with table-doubling costs $O(n)$ total. Use the potential function method: define $\Phi = 2 \cdot \text{size} - \text{capacity}$ and show that the amortised cost of each insert is $O(1)$.
Hard
insert_last. Then argue that smaller $r$ (e.g. $r = 1.1$) wastes less space but causes more frequent resizes — quantify the tradeoff in terms of wasted space and total copy work for $n$ insertions.
Hard
get_at(i) in $O(\log n)$, insert_at(i, x) in $O(\log n)$, delete_at(i) in $O(\log n)$. You may use any data structure you know. (Hint: this is achievable — what structure naturally gives $O(\log n)$ for all positional operations? You will build it in Chapter 7.)
Hard