Mixture-of-Experts in the Era of LLMs: A New Odyssey

Mon 22 Jul 1 p.m. — 3 p.m. CEST

Location: Hall A1

ICML 2024 tutorial

ICML 2024 Tutorial on "Mixture-of-Experts in the Era of LLMs: A New Odyssey"

Recently, Large Language Models (LLMs) have showcased remarkable generalization capabilities across a plethora of tasks, yielding notable successes. The scale of these models stands out as a pivotal determinant in enhancing LLM performance. However, the escalation in model size significantly amplifies the costs associated with both pre-training and fine-tuning, while simultaneously constraining inference speed. Consequently, there has been a surge in exploration aimed at devising novel techniques for model scaling. Among these, the sparse Mixture-of-Experts (MoE) has garnered considerable attention due to its ability to expedite pre-training and enhance inference speed, especially when compared to dense models with equivalent parameter counts.

This tutorial endeavors to offer a comprehensive overview of MoE within the context of LLMs. The discussion commences by revisiting extant research on MoE, elucidating critical challenges encountered within this domain. Subsequent exploration delves into the intricate relationship between MoE and LLMs, encompassing sparse scaling of pre-training models and the conversion of existing dense models into sparse MoE counterparts. Moreover, the tutorial elucidates the broader advantages conferred by MoE beyond mere efficiency.

Overall, this tutorial delineates the evolutionary trajectory of MoE within the landscape of LLMs, underscoring its pivotal role in the era of LLMs.

Program (CEST PM)

Sessions Title Speakers
1:00-1:25 Overview & Key Challenges in MoEs and their Crucial Roles in LLMs [Slides]    Tianlong Chen
1:25-1:55 MoE Architecture Variance, Building MoE from Dense LLMs, and MoE Beyond Efficiency [Slides]   Yu Cheng
1:55-2:10 How to Train a Superior MoE from a System View? [Slides] Minjia Zhang
2:10-2:25 Key Extension - Multi-Modal MoE; Multi-Agent Communications [Slides] Mohit Bansal, Tianlong Chen
2:25-3:00 Panel - MoE Designs, Multi-Modal Multi-Task MoE, Multi-Agent MoE Tianlong Chen (Moderator), Yu Cheng, Beidi Chen, Minjia Zhang, Mohit Bansal

Organizers

Tianlong Chen

The University of North Carolina at Chapel Hill

Yu Cheng

Chinese University of Hong Kong

Beidi Chen

Carnegie Mellon University

Minjia Zhang

University of Illinois Urbana-Champaign

Mohit Bansal

The University of North Carolina at Chapel Hill

Student Contributors

Pingzhi Li

The University of North Carolina at Chapel Hill

Xinyu Zhao

The University of North Carolina at Chapel Hill

Mufan Qiu

The University of North Carolina at Chapel Hill

Contacts

Contact the Organizing Committee: tianlong@cs.unc.edu, pingzhi@cs.unc.edu