A garden stub. The polished version lives at SIMD integer arithmetic.
Vectorizing integer math is hard because x86 SIMD has no integer-division instruction. The workarounds — reciprocal multiply, limb-based long division, Newton–Raphson — are explored in the research note.
Related: performance-reading-list, cuda-debugging, llm-inference.