A running list of things worth reading for performance work. The structured version drives the charts on the data page.
Integer / SIMD
- Hacker’s Delight — bit tricks and division-by-constant.
- “Division by Invariant Integers using Multiplication” (Granlund & Montgomery).
- Intel Optimization Reference Manual — SIMD sections.
LLM inference
- PagedAttention / vLLM paper.
- Speculative decoding.
GPU
- CUDA C++ Programming Guide.
- PyTorch internals (ezyang’s blog).
See also: simd-integer-arithmetic, llm-inference.