Conversation
|
This change is part of the following stack: Change managed by git-spice. |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22359629088 |
6bbebab to
c469ae4
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22393112086 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22401124059 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22393112087 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22403841820 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22406520501 |
d8259f9 to
2642fae
Compare
9b08cb3 to
79ebda6
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22416790591 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22416790628 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22417654701 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22417446269 |
67b6432 to
8403328
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22495614431 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22495614471 |
8403328 to
e1231a4
Compare
f074022 to
df08c06
Compare
e1231a4 to
0bef9f4
Compare
df08c06 to
fd74de2
Compare
0bef9f4 to
029b183
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22508310210 |
fd74de2 to
38f8a15
Compare
029b183 to
006aa6e
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22512778429 |
006aa6e to
b34dc7a
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22647629481 |
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22678705175 |
dffa047 to
8891b14
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22700394547 |
10fe935 to
5faef68
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22701505825 |
597ffe5 to
e7cf547
Compare
|
@abatilo Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/22729124486 |
SLIME's post-training pipeline (Megatron + SGLang) requires SGLang v0.5.9. This upgrades from v0.4.x and rebases the image onto a plain torch base, dropping the torch-extras layer (DeepSpeed, Apex, xFormers) that neither SGLang nor SLIME actually uses. FlashInfer moves from JIT to v0.6.3 AOT compilation via TVM. sgl-kernel is now built with scikit-build-core and enables SM100A (Blackwell) and FP4 support. vLLM and Triton are removed from this image since they are served by the dedicated vllm-tensorizer image.
SLIME combines Megatron-LM and SGLang for reinforcement learning based post-training of large language models. This image builds TransformerEngine 2.10, Apex, and a patched SGLang wheel on top of our sglang base, then installs Megatron-LM with SLIME's patches for routing replay and memory management. Patches are versioned under slime/patches/v0.5.7/ and documented in README.md with their upstream origins from THUDM/SLIME.
d246ce3 to
34f06f0
Compare
WIP: Add slime RL post-training image
Adds Docker build infrastructure for THUDM/slime, an RL post-training framework that coordinates Megatron-LM (training) + SGLang (rollout inference) via Ray.
What this does
Layers the slime training stack on top of the existing
sglangimage:10417ace4b6f62e23714d81db964eedcArchitecture
Follows the established ml-containers two-stage build pattern:
Inherits Blackwell sm_100a support and multi-arch (amd64+arm64) from the base sglang image.
TODO
docker buildagainst the latest sglang image