Barrett Reduction
Idea
To compute x mod n without a divide, precompute m = floor(2^k / n). Then
x mod n can be recovered with a multiply by m, a shift by k, and a small
correction. The expensive division is amortized into a one-time reciprocal
computation.
Why it’s fast
Multiplication and shifts are cheap and vectorizable, unlike division. For a fixed modulus reused across many values, Barrett (or Montgomery) reduction turns a latency-bound divide into throughput-bound multiplies.
Trade-offs
- Needs a wide intermediate product (e.g. 128-bit for 64-bit operands).
- Only pays off when the divisor/modulus is constant or reused many times.
Used as the “constant divisor” path in SIMD integer arithmetic.
Integer arithmeticModular arithmetic