(ICML 24 tutorial) Mixture-of-Experts in the Era of LLMs: A New Odyssey

ICML 2024 Tutorial on "Mixture-of-Experts in the Era of LLMs: A New Odyssey"

Recently, Large Language Models (LLMs) have showcased remarkable generalization capabilities across a plethora of tasks, yielding notable successes. The scale of these models stands out as a pivotal determinant in enhancing LLM performance. However, the escalation in model size significantly amplifies the costs associated with both pre-training and fine-tuning, while simultaneously constraining inference speed. Consequently, there has been a surge in exploration aimed at devising novel techniques for model scaling. Among these, the sparse Mixture-of-Experts (MoE) has garnered considerable attention due to its ability to expedite pre-training and enhance inference speed, especially when compared to dense models with equivalent parameter counts.

This tutorial endeavors to offer a comprehensive overview of MoE within the context of LLMs. The discussion commences by revisiting extant research on MoE, elucidating critical challenges encountered within this domain. Subsequent exploration delves into the intricate relationship between MoE and LLMs, encompassing sparse scaling of pre-training models and the conversion of existing dense models into sparse MoE counterparts. Moreover, the tutorial elucidates the broader advantages conferred by MoE beyond mere efficiency.

Overall, this tutorial delineates the evolutionary trajectory of MoE within the landscape of LLMs, underscoring its pivotal role in the era of LLMs.

Program (CEST PM)

Sessions	Title	Speakers
1:00-1:25	Overview & Key Challenges in MoEs and their Crucial Roles in LLMs [Slides]	Tianlong Chen
1:25-1:55	MoE Architecture Variance, Building MoE from Dense LLMs, and MoE Beyond Efficiency [Slides]	Yu Cheng
1:55-2:10	How to Train a Superior MoE from a System View? [Slides]	Minjia Zhang
2:10-2:25	Key Extension - Multi-Modal MoE; Multi-Agent Communications [Slides]	Mohit Bansal, Tianlong Chen
2:25-3:00	Panel - MoE Designs, Multi-Modal Multi-Task MoE, Multi-Agent MoE	Tianlong Chen (Moderator), Yu Cheng, Beidi Chen, Minjia Zhang, Mohit Bansal