Hi! I'm Raja...
I'm a Research Fellow (Pre-doctoral Researcher) in the AI-infrastructure team at Microsoft Research India (MSR-I). In 2023,
I graduated from the Undergraduate Programmes at the Indian Institute of Technology, Bombay, where I earned my B.Tech (with Honors) in Computer Science. My primary interests lie in Systems for ML, Compute Express Link (CXL), Networking, and Systems in general.
At MSR-I, I am working with Dr. Nipun Kwatra and Dr. Ramchandran Ramjee on improving
GPU utilization for LLM inference. Specifically, our focus is on the communication aspect of multi-GPU LLM inference, which currently lies in the critical path and impacts both latency and efficiency. We are looking at approaches to mitigate these overheads by hiding them behind existing computations, particularly for Mixture of Expert (MoE) models as most large high-accuracy models are MoEs.
During my undergrad, I worked with Prof. Purushottam (Puru) Kulkarni on CXL and persistent memory.
|
 |