A running list of things worth reading for performance work. The structured version drives the charts on the data page.

Integer / SIMD

  • Hacker’s Delight — bit tricks and division-by-constant.
  • “Division by Invariant Integers using Multiplication” (Granlund & Montgomery).
  • Intel Optimization Reference Manual — SIMD sections.

LLM inference

  • PagedAttention / vLLM paper.
  • Speculative decoding.

GPU

  • CUDA C++ Programming Guide.
  • PyTorch internals (ezyang’s blog).

See also: simd-integer-arithmetic, llm-inference.