Research & Systems Engineer
Hi, I'm Kyushick Lee.
I work on systems, performance, compilers, AI infrastructure, and low-level optimization.
Currently exploring
SIMD / vectorized integer arithmeticLLM inference optimizationPyTorch / CUDA / GPU programmingSystems performanceDeveloper tooling
Selected projects
All projects →Vectorizing 64-bit Integer Division
Emulating 64-bit integer division using 32-bit SIMD lanes for a measurable speedup.
active SIMDInteger arithmeticPerformance
Building PyTorch from Source
A reproducible workflow for building PyTorch from source with CUDA on WSL2.
maintained PyTorchCUDABuild systems
Selected writing
All writing →- Why I Built This Site A research-engineer's notebook: portfolio, writing, and an interconnected digital garden.
Research notes
All research →- LLM Inference Optimization Where the time and memory actually go when serving large language models.
- Barrett Reduction Replacing division by a constant modulus with a multiply and a shift.
- Knuth Algorithm D Schoolbook long division on machine-word limbs, done carefully.
- SIMD Integer Arithmetic Why integer division resists vectorization and how to work around it.
Contact
Interested in systems performance, AI infrastructure, or low-level optimization? Reach out via email or GitHub.