LLM Inference

A garden stub. The polished version lives at LLM inference optimization.

Decode is memory-bandwidth-bound and dominated by the KV cache; prefill is compute-bound. Levers: paged attention, speculative decoding, quantization.

Related: simd-integer-arithmetic, performance-reading-list.

Kyushick's Notes

Explorer

Graph View

Backlinks