-
Notifications
You must be signed in to change notification settings - Fork 181
Pull requests: awslabs/awsome-distributed-training
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add veRL GRPO training recipe for gpt-oss-20b on g5.12xlarge
#1054
opened Apr 4, 2026 by
nkumaraws
Loading…
feat: add OpenRLHF GRPO training recipe for gpt-oss-20b on HyperPod EKS (g5.12xlarge)
#1053
opened Apr 4, 2026 by
nkumaraws
Loading…
feat: add veRL GRPO training recipe for gpt-oss-20b on HyperPod EKS (g5.12xlarge)
#1052
opened Apr 4, 2026 by
nkumaraws
Loading…
Enhance instance validation and visualization script for network topo…
#1051
opened Apr 3, 2026 by
dushyant8858
Loading…
5 of 7 tasks
Bump requests from 2.32.3 to 2.33.0 in /3.test_cases/pytorch/nvrx
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#1036
opened Mar 25, 2026 by
dependabot
bot
Loading…
Add V-JEPA 2 (Meta FAIR) distributed training test case
#1035
opened Mar 23, 2026 by
paragao
Loading…
feat: Support multiple SSH key types in easy-ssh.sh with auto-detection
#1031
opened Mar 20, 2026 by
aravneelaws
Loading…
5 of 7 tasks
feat: Add observability IAM permissions for RIG cluster execution role
#1030
opened Mar 20, 2026 by
Madhubalasri-B
Loading…
Add DeepSpeed CI regression tests for QLoRA and GPT-103B
#1029
opened Mar 20, 2026 by
paragao
Loading…
Bump transformers from 4.48.0 to 4.53.0 in /3.test_cases/pytorch/nvrx
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#1026
opened Mar 18, 2026 by
dependabot
bot
Loading…
Add NeMo RL GRPO training on P5en with EFA RDMA
#1025
opened Mar 17, 2026 by
dmvevents
Loading…
5 of 7 tasks
Updating hyperpod-elastic-agent (HPEA) to v1.1.2 to support torch v2.6+
#1022
opened Mar 13, 2026 by
aravneelaws
Loading…
7 tasks done
docs: add Instance Compatibility Guide with per-test-case configuration tables
#1017
opened Mar 11, 2026 by
nkumaraws
Loading…
Add NeMo RL GRPO training with fault tolerance (NVRx) on EKS
#1010
opened Mar 9, 2026 by
dmvevents
Loading…
6 tasks
Add optional Training Plan support for HyperPod instance groups
#1004
opened Feb 26, 2026 by
newabdosheham
Loading…
Syntax improvements and code quality enhancements for EFA node exporter
#966
opened Feb 17, 2026 by
KeitaW
Loading…
ProTip!
Mix and match filters to narrow down what you’re looking for.