DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Updated
Feb 27, 2026 - Python
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Optimizing inference proxy for LLMs
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
Run Mixtral-8x7B models in Colab or consumer desktops
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
Codebase for Aria - an Open Multimodal Native MoE
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
Surrogate Modeling Toolbox
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)
中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
MoH: Multi-Head Attention as Mixture-of-Head Attention
PyTorch library for cost-effective, fast and easy serving of MoE models.
GMoE could be the next backbone model for many kinds of generalization task.
[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Add a description, image, and links to the mixture-of-experts topic page so that developers can more easily learn about it.
To associate your repository with the mixture-of-experts topic, visit your repo's landing page and select "manage topics."