Kyushick's Notes

Home

❯

LLM Inference

LLM Inference

Jun 29, 20261 min read

  • llm
  • inference

A garden stub. The polished version lives at LLM inference optimization.

Decode is memory-bandwidth-bound and dominated by the KV cache; prefill is compute-bound. Levers: paged attention, speculative decoding, quantization.

Related: simd-integer-arithmetic, performance-reading-list.


Graph View

Backlinks

  • Digital Garden
  • Performance Reading List
  • SIMD Integer Arithmetic

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community