← All research

LLM Inference Optimization

The two regimes

LLM inference splits into two very different phases:

Levers

Why it connects to the rest of this site

Decode performance is fundamentally an integer/low-precision arithmetic and memory-bandwidth problem — the same themes as SIMD integer arithmetic.

To explore

LLM inferenceSystemsPerformance