A garden stub. The polished version lives at LLM inference optimization.
Decode is memory-bandwidth-bound and dominated by the KV cache; prefill is compute-bound. Levers: paged attention, speculative decoding, quantization.
Related: simd-integer-arithmetic, performance-reading-list.