Skip to content

[CI] add mi300 pipeline (clean)#682

Merged
chaoos merged 9 commits intomasterfrom
feature/cicd-mi300-clean
Apr 10, 2026
Merged

[CI] add mi300 pipeline (clean)#682
chaoos merged 9 commits intomasterfrom
feature/cicd-mi300-clean

Conversation

@chaoos
Copy link
Copy Markdown
Contributor

@chaoos chaoos commented Apr 7, 2026

This PR adds requirements on the side of tmLQCD for CI/CD testing on the CSCS test system "beverin". This system hosts AMD MI300A GPUs. This is the clean version of #669, without the merge mess (I'm sorry!).

The pipeline was changed as follows:

Additonally, one can comment on any PR with

cscs-ci run beverin

To run the pipeline on MI300 nodes at CSCS. The test is equivalent to the GH200 pipeline test.

Both pipelines (GH200 and MI300) where changed to have 3 stages now:

  • prepare: builds the base image with all dependencies in it
  • build: build tmLQCD for the PR, and QUDA from its newest head commit (this ALWAYS rebuilds QUDA and tmQLCD for every invocation of the pipeline bypassing all build caches).
  • test: run the HMC test as before

Furthermore both comments can be supplemented by variables which propagate to the pipeline jobs. For instance:

cscs-ci run beverin;VARIABLE=value

Available variables to set are:

  • QUDA_GIT_REPO: the git repository URL to use as source for the QUDA spack build in the build stage (defaults to https://github.com/lattice/quda.git)
  • QUDA_GIT_BRANCH: the git branch (defaults to develop)
  • QUDA_GIT_COMMIT: the git commit (defaults to the current head commit of QUDA_GIT_BRANCH)

This functionality is there to be able to test the whole pipeline against a certain QUDA branch that is active in development (possibly on a fork). For instance:

cscs-ci run beverin;QUDA_GIT_BRANCH=feature/prefetch2

will pull the most recent commit of the feature/prefetch2 branch of QUDA instead and compile and run against that one, or

cscs-ci run beverin;QUDA_GIT_COMMIT=c9308c9a20cd7a68f8f45f20c7141b83dbc7f44a

will checkout a certain commit hash instead of the most recent one.

This works on the GH200 as well as on the MI300 pipeline, but not the github actions pipelines.

TODO:

  • minimal dependency list in environment.yaml.
  • make quda@develop work in the spack spec even through develop is an evolving target.
  • merge CMake support #664 into feature/cicd-mi300
  • merge CMake support #664 into master
  • adopt to cmake build
  • Remove echo "VARIABLE = $VARIABLE" and other noise in all yaml files
  • Adjust GH200 pipeline to match MI300A pipeline
  • Fix solver precision, see Adjust output precision for CSCS pipeline #681

@chaoos
Copy link
Copy Markdown
Contributor Author

chaoos commented Apr 9, 2026

cscs-ci run beverin

@chaoos
Copy link
Copy Markdown
Contributor Author

chaoos commented Apr 9, 2026

cscs-ci run default

@chaoos
Copy link
Copy Markdown
Contributor Author

chaoos commented Apr 10, 2026

@kostrzewa @mtaillefumier I'm merging this, since all pipelines (finally!) run properly.

@chaoos chaoos merged commit 0436f15 into master Apr 10, 2026
5 checks passed
@kostrzewa
Copy link
Copy Markdown
Member

@chaoos thanks a lot for this! I'm a bit surprised about the lax precision that you had to settle for in the checks after setting such strict solver precisions but I guess it is what it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants