Research

Microsoft Research | AI Infrastructure

Research Fellow | Advisors: Dr. Nipun Kwatra, Dr. Ramachandran Ramjee
Bangalore, India | Jul 2023 - Present

More Details ↓

Samanvaya: Compute-Communication Overlap for Efficient Inference in Mixtures of Experts (MoE) Models

  • Proposed a fine-grained overlap method that effectively hides communication costs in MoE models
  • Implemented Expert Parallelism in vLLM and highlighted its benefits over Tensor Parallelism for MoE
  • Developed a lightweight signaling mechanism to initiate Direct Memory Access (DMA)-based partial GPU-GPU
    communication, which frees all SMs to be used by compute kernel and allows effective overlap
  • Demonstrated up to a 20% reduction in MoE MLP time for Mixtral 22B in microbenchmarks on 8 H100s
  • Working to resolve expert load-balancing issues that hinder our gains in end-to-end performance

Compute-Communication Overlap for Efficient Inference in Dense Large Language Models (LLMs)

  • Developed a method that decomposes computation and hides communication in Tensor Parallelism for LLMs, reducing
    communication overhead by 15% on GPT-3 microbenchmarks on A100 GPUs with NVLink
  • Explored prior overlap solutions to identify issues caused when applying them to new models and GPUs

Virginia Tech | Department of Computer Science

Research Collaborator | Advisor: Prof. Huaicheng Li
Remote | Jul 2024 - Sep 2024

More Details ↓

Damon-CXL: Two-tier memory management for Compute Express Link (CXL) memory

  • Integrated DAMON-based memory management patches into the linux and reviewed the source code
  • Analyzed Redis performance on emulated CXL memory using YCSB benchmarks and
    compared results with vanilla linux memory management configurations to identify improvements and bottlenecks

IIT Bombay | SynerG Lab, Dept. of Computer Science and Engineering

Undergraduate Researcher | Advisor: Prof. Purushottam Kulkarni
Mumbai, India | Aug 2022 - Jun 2023

More Details ↓

emucxl: Emulation Framework and Access Library for CXL-Based Disaggregated Memory Systems

  • Developed a user-space library coupled with a NUMA-based CXL emulation backend for
    standardized CXL memory access that enables rapid prototyping of disaggregated memory solutions
  • Conducted a literature survey on CXL stds and showed emucxl capabilities through practical use cases

R&D Project: Persistent Memory (PMem) Applications [PDF, code]

  • Designed and implemented a robust reader-writer program on Non-Volatile Memory using
    advanced array and pointer techniques, which provides fault tolerance and efficient data access
  • Explored Persistent Memory Development Kit libraries to understand PMem capabilities and
    analyzed performance differences between traditional and PMem-based Redis using real-world benchmarks