About
About me
I'm a software engineer and computer architect with 10+ years of experience in memory and computing system architecture, parallel programming, and system resilience — with a recent focus on LLM inference and training.
Today
I'm a Senior Software Engineer at Microsoft (Azure Hardware Architecture, AI Frameworks), where I build kernels, runtime libraries, and the LLM serving stack for the Maia ASIC accelerators. I designed Maia's host/device programming model, delivered core SDK components, integrated Maia into PyTorch and ONNX Runtime, and partnered with OpenAI to ship the Maia-powered GitHub Copilot demo at Ignite 2023.
Background
I earned my PhD in Electrical and Computer Engineering from the University of Texas at Austin (advised by Mattan Erez), where I built the Containment Domains resilience runtime for high-performance and GPU-dense computing. Along the way I interned at NVIDIA Research, Intel's Open Source Technology Center, and Lawrence Livermore National Laboratory.
Areas of interest
- LLM inference and training on AI accelerators
- Kernel authoring, MoE kernels, and vector kernel frameworks
- Runtime systems, programming models, and device/stream control
- PyTorch / ONNX Runtime / Triton integration
- System resilience and checkpointing
- Systems performance and benchmarking
Professional focus
I care about correctness-first performance work: measurable wins, honest benchmarks, and clear write-ups that others can learn from. This site is both a portfolio and a working notebook of that practice.
Contact
The best ways to reach me: