Microsoft Research | AI Infrastructure
Research Fellow | Advisors: Dr. Nipun Kwatra,
Dr. Ramachandran Ramjee
Bangalore, India | Jul 2023 - Present
More Details ↓
Samanvaya: Compute-Communication Overlap for Efficient Inference in Mixtures of Experts (MoE) Models
- Proposed a fine-grained overlap method that effectively hides communication costs in MoE models
- Implemented Expert Parallelism in vLLM and highlighted its benefits over Tensor Parallelism for MoE
- Developed a lightweight signaling mechanism to initiate Direct Memory Access (DMA)-based partial GPU-GPU
communication, which frees all SMs to be used by compute kernel and allows effective overlap
- Demonstrated up to a 20% reduction in MoE MLP time for Mixtral 22B in microbenchmarks on 8 H100s
- Working to resolve expert load-balancing issues that hinder our gains in end-to-end performance
Compute-Communication Overlap for Efficient Inference in Dense Large Language Models (LLMs)
- Developed a method that decomposes computation and hides communication in Tensor Parallelism for LLMs, reducing
communication overhead by 15% on GPT-3 microbenchmarks on A100 GPUs with NVLink
- Explored prior overlap solutions to identify issues caused when applying them to new models and GPUs
IIT Bombay | SynerG Lab, Dept. of Computer Science and Engineering
Undergraduate Researcher | Advisor: Prof. Purushottam Kulkarni
Mumbai, India | Aug 2022 - Jun 2023
More Details ↓
emucxl: Emulation Framework and Access Library for CXL-Based Disaggregated Memory Systems
- Developed a user-space library coupled with a NUMA-based CXL emulation backend for
standardized CXL memory access that enables rapid prototyping of disaggregated memory solutions
- Conducted a literature survey on CXL stds and showed emucxl capabilities through practical use cases
R&D Project: Persistent Memory (PMem) Applications
[PDF,
code]
- Designed and implemented a robust reader-writer program on Non-Volatile Memory using
advanced array and pointer techniques, which provides fault tolerance and efficient data access
- Explored Persistent Memory Development Kit libraries to understand PMem capabilities and
analyzed performance differences between traditional and PMem-based Redis using real-world benchmarks