Research

Research Fellow | Advisors: Dr. Nipun Kwatra, Dr. Ramachandran Ramjee
Bangalore, India | Jul 2023 - Present

More Details ↓

Samanvaya: Compute-Communication Overlap for Efficient Inference in Mixtures of Experts (MoE) Models

Proposed a fine-grained overlap method that effectively hides communication costs in MoE models
Implemented Expert Parallelism in vLLM and highlighted its benefits over Tensor Parallelism for MoE
Developed a lightweight signaling mechanism to initiate Direct Memory Access (DMA)-based partial GPU-GPU
communication, which frees all SMs to be used by compute kernel and allows effective overlap
Demonstrated up to a 20% reduction in MoE MLP time for Mixtral 22B in microbenchmarks on 8 H100s
Working to resolve expert load-balancing issues that hinder our gains in end-to-end performance

Compute-Communication Overlap for Efficient Inference in Dense Large Language Models (LLMs)

Developed a method that decomposes computation and hides communication in Tensor Parallelism for LLMs, reducing
communication overhead by 15% on GPT-3 microbenchmarks on A100 GPUs with NVLink
Explored prior overlap solutions to identify issues caused when applying them to new models and GPUs

Research Collaborator | Advisor: Prof. Huaicheng Li
Remote | Jul 2024 - Sep 2024

More Details ↓

Damon-CXL: Two-tier memory management for Compute Express Link (CXL) memory

Integrated DAMON-based memory management patches into the linux and reviewed the source code
Analyzed Redis performance on emulated CXL memory using YCSB benchmarks and
compared results with vanilla linux memory management configurations to identify improvements and bottlenecks

Undergraduate Researcher | Advisor: Prof. Purushottam Kulkarni
Mumbai, India | Aug 2022 - Jun 2023

More Details ↓

emucxl: Emulation Framework and Access Library for CXL-Based Disaggregated Memory Systems

Developed a user-space library coupled with a NUMA-based CXL emulation backend for
standardized CXL memory access that enables rapid prototyping of disaggregated memory solutions
Conducted a literature survey on CXL stds and showed emucxl capabilities through practical use cases

R&D Project: Persistent Memory (PMem) Applications [PDF, code]

Designed and implemented a robust reader-writer program on Non-Volatile Memory using
advanced array and pointer techniques, which provides fault tolerance and efficient data access
Explored Persistent Memory Development Kit libraries to understand PMem capabilities and
analyzed performance differences between traditional and PMem-based Redis using real-world benchmarks