Add hardware-enabled copilot runner by JoseSantosAMD · Pull Request #399 · ROCm/iris

JoseSantosAMD · 2026-02-26T16:17:08Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

- Introduced multiple scripts for managing GitHub Actions runners within an Apptainer container, including build, run, and cleanup functionalities. - Added comprehensive documentation (README.md, AGENTS.md, skills.md) detailing setup, usage, and environment configuration. - Included example environment files for both runner and container configurations to guide users in setting up their environments securely and effectively. This update establishes a robust framework for running GitHub Actions in HPC environments, enhancing usability and maintainability.

- Introduced a new workflow file to automate the setup of a Python virtual environment for Copilot. - The workflow includes steps for checking out the repository, creating and activating a virtual environment, installing dependencies, and verifying ROCm and GPU visibility. - This addition enhances the CI/CD process by streamlining the environment setup for Copilot integration.

JoseSantosAMD · 2026-02-26T16:20:17Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

Copilot · 2026-02-26T16:20:28Z

@JoseSantosAMD I've opened a new pull request, #400, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot

Pull request overview

Adds infrastructure to run Iris workflows on an AMD/ROCm-capable self-hosted “copilot” GitHub Actions runner, with Apptainer + SLURM scripts to build/run/manage the runner container.

Changes:

Introduces a copilot-setup-steps workflow to bootstrap a Python venv and sanity-check ROCm visibility on the self-hosted runner.
Adds Apptainer/SLURM runner lifecycle scripts (build, run, start) plus configuration templates.
Adds documentation for setup/usage/troubleshooting of the coding-agent runner environment.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
.github/workflows/copilot-setup-steps.yml	Manual workflow to prepare venv and verify ROCm tooling on the self-hosted runner.
.github/scripts/github-runner-files/start.sh	Container entrypoint that installs/configures the Actions runner and starts the listener.
.github/scripts/github-runner-files/run-github-coding-agent-runner.sh	SLURM/standalone launcher to execute the runner inside an Apptainer container.
.github/scripts/github-runner-files/build-github-coding-agent-runner.sh	SLURM build script to produce the runner container SIF image.
.github/scripts/github-runner-files/cleanup-old-runners.sh	Utility to remove stale runner configs/overlay artifacts.
.github/scripts/github-runner-files/runner-container.env.example	Template for container environment overrides (paths/options).
.github/scripts/github-runner-files/env.example	Template for exporting runner-related env vars (token, repo, labels).
.github/scripts/github-runner-files/README.md	End-to-end documentation for building and running the self-hosted runner.
.github/scripts/github-runner-files/skills.md	Additional build/run notes (SLURM-focused).
.github/scripts/github-runner-files/AGENTS.md	Local conventions for editing these scripts (no secrets, no host-specific paths).

Copilot · 2026-02-26T16:25:42Z

.github/scripts/github-runner-files/start.sh

+# Point git config to a writable location (can be overridden by env file)
+export GIT_CONFIG_GLOBAL="${GIT_CONFIG_GLOBAL:-${RUNNER_WORKDIR}/.gitconfig}"
+mkdir -p "$(dirname "$GIT_CONFIG_GLOBAL")"
+git config --global --add safe.directory '*'


git config --global --add safe.directory '*' disables Git's ownership safety checks for all directories. On a self-hosted runner (especially if it can run arbitrary PR code), this broad trust is a security risk. Prefer adding only the specific workspace directory (or runner work subdirectory) that needs to be treated as safe instead of a wildcard.

Suggested change

git config --global --add safe.directory '*'

if [ -n "${RUNNER_WORKDIR:-}" ]; then

git config --global --add safe.directory "${RUNNER_WORKDIR}"

fi

Copilot · 2026-02-26T16:25:42Z

.github/scripts/github-runner-files/cleanup-old-runners.sh

+
+set -e
+
+WORK_DIR="${WORK:-/work1/amd/josantos}"


This hardcodes a host-specific absolute path (/work1/amd/josantos) as the default WORK_DIR, which conflicts with the repo’s stated convention to avoid machine-specific paths (see .github/scripts/github-runner-files/AGENTS.md:17-21). Please remove the hardcoded default and rely on env (e.g. $WORK, $HOME) or require the caller to pass a base directory.

Suggested change

WORK_DIR="${WORK:-/work1/amd/josantos}"

WORK_DIR="${WORK:-$HOME}"

Copilot · 2026-02-26T16:25:43Z

.github/scripts/github-runner-files/run-github-coding-agent-runner.sh

+        --bind "${RUNNER_HOME_HOST}:/runner-home:rw" \
+        --bind "${RUNNER_WORKDIR}:${RUNNER_WORKDIR}" \
+        ${RUNNER_TMP_BIND:+"$RUNNER_TMP_BIND"} \
+        --env "RUNNER_HOME=/runner-home" \


The optional /tmp bind is passed as a single argument because RUNNER_TMP_BIND contains spaces and is expanded with quotes (${RUNNER_TMP_BIND:+"$RUNNER_TMP_BIND"}). Apptainer expects --bind and its value as separate argv tokens (or --bind=...), so this likely won't bind /tmp as intended. Consider using an argv array (or conditionally appending --bind "${RUNNER_TMP}:/tmp:rw") instead of building a spaced string.

Copilot · 2026-02-26T16:25:43Z

.github/scripts/github-runner-files/run-github-coding-agent-runner.sh

+if [ -n "${USE_OVERLAY}" ] && [ "${USE_OVERLAY}" != "0" ] && [ -d "${OVERLAY_DIR}" ]; then
+    apptainer exec \
+        --no-home \
+        --overlay "${OVERLAY_DIR}" \
+        --bind "${SCRIPT_DIR}:/runner-scripts:ro" \


--overlay "${OVERLAY_DIR}" is being given a directory created via mkdir -p, but elsewhere in the repo overlays are created as an image file with apptainer overlay create (e.g. .github/scripts/container_exec.sh:71-79 uses iris_overlay_...img). Passing an empty directory here is unlikely to be a valid overlay and can cause the runner container startup to fail. Consider creating an overlay image file under RUNNER_BASE (per job) with apptainer overlay create ... overlay.img, pass that file to --overlay, and clean it up when the runner stops.

Copilot · 2026-02-26T16:25:43Z

.github/scripts/github-runner-files/start.sh

+    echo "Found existing runner configuration, removing..."
+    ./config.sh remove --token "$REGISTRATION_TOKEN" || true


When an existing .runner config is present, config.sh remove is invoked with the registration token. The script later fetches a dedicated remove token (.../actions/runners/remove-token) for cleanup, which suggests the registration token may be the wrong credential for removal. If the remove step fails (currently ignored), the local .runner file can remain and config.sh may refuse to configure. Fetch and use a remove token for this removal (or ensure local config files are removed on failure) before calling config.sh --replace.

Suggested change

echo "Found existing runner configuration, removing..."

./config.sh remove --token "$REGISTRATION_TOKEN" || true

echo "Found existing runner configuration, requesting remove token..."

PRE_REMOVE_RESPONSE=$(curl -s -X POST \

-H "Authorization: token $GITHUB_TOKEN" \

-H "Accept: application/vnd.github.v3+json" \

"https://api.github.com/repos/$GITHUB_REPOSITORY/actions/runners/remove-token")

if command -v jq >/dev/null 2>&1; then

PRE_REMOVE_TOKEN=$(echo "$PRE_REMOVE_RESPONSE" | jq -r .token)

else

PRE_REMOVE_TOKEN=$(echo "$PRE_REMOVE_RESPONSE" | grep -o '"token":"[^"]*"' | head -1 | cut -d'"' -f4)

fi

if [ "$PRE_REMOVE_TOKEN" != "null" ] && [ -n "$PRE_REMOVE_TOKEN" ]; then

echo "Removing existing runner with remove token..."

if ! ./config.sh remove --token "$PRE_REMOVE_TOKEN"; then

echo "Warning: Failed to remove existing runner with remove token; deleting local .runner config."

rm -f ".runner"

fi

else

echo "Warning: Could not obtain remove token for existing runner; deleting local .runner config."

rm -f ".runner"

fi

Copilot · 2026-02-26T16:25:43Z

.github/scripts/github-runner-files/start.sh

+echo "Press Ctrl+C to stop"
+echo "=========================================="
+command -v rocminfo >/dev/null 2>&1 && rocminfo || true
+./run.sh


GITHUB_TOKEN (a long-lived PAT with repo scope) is passed into the container and remains in the environment when ./run.sh starts, so every workflow job executed on this runner can read and exfiltrate it. An attacker who can trigger a workflow on this self-hosted runner (for example via a pull request workflow) could simply print $GITHUB_TOKEN in a job step and gain full API access to the repository outside of Actions. To avoid leaking this secret, keep using it only within start.sh/cleanup to obtain registration/remove tokens but ensure it is not present in the runner job environment (for example by removing or masking GITHUB_TOKEN from the environment before invoking ./run.sh or otherwise isolating the PAT from workflow steps).

mawad-amd · 2026-02-26T16:32:05Z

@copilot can you run example 7 for 10 different GEMM shapes and report back the results?

Copilot · 2026-02-26T16:32:14Z

@mawad-amd I've opened a new pull request, #401, to work on those changes. Once the pull request is ready, I'll request review from you.

mawad-amd

Not sure if we need to merge first before testing or no so feel free to merge this.

add issue_comment trigger with created, edited gate job to PR comments containing @copilot keep manual workflow_dispatch target self-hosted runner labels: copilot, apptainer, iris

JoseSantosAMD · 2026-02-26T17:27:45Z

@copilot can you run example 7 for 10 different GEMM shapes and report back the results? @copilot

Copilot · 2026-02-26T17:27:53Z

@JoseSantosAMD I've opened a new pull request, #402, to work on those changes. Once the pull request is ready, I'll request review from you.

JoseSantosAMD · 2026-02-26T17:46:03Z

@copilot please run setup and can you run example 7 for 10 different GEMM shapes and report back the results? @copilot

Copilot · 2026-02-26T17:46:11Z

@JoseSantosAMD I've opened a new pull request, #404, to work on those changes. Once the pull request is ready, I'll request review from you.

JoseSantosAMD and others added 4 commits February 25, 2026 14:16

Apply Ruff auto-fixes

aa5a3a1

Merge branch 'main' into copilot_enabled_runner

5620a1f

Copilot AI review requested due to automatic review settings February 26, 2026 16:17

JoseSantosAMD requested review from BKP, mawad-amd and neoblizz as code owners February 26, 2026 16:17

github-actions bot added in-progress We are working on it iris Iris project issue labels Feb 26, 2026

Copilot started reviewing on behalf of JoseSantosAMD February 26, 2026 16:18 View session

Copilot AI mentioned this pull request Feb 26, 2026

Report 07_gemm_all_scatter benchmark results from CI #400

Closed

Copilot AI reviewed Feb 26, 2026

View reviewed changes

mawad-amd mentioned this pull request Feb 26, 2026

Copilot enabled runner #398

Closed

1 task

mawad-amd changed the title ~~Jose santos amd/copilot enabled runner~~ Add hardware-enabled copilot runner Feb 26, 2026

Copilot AI mentioned this pull request Feb 26, 2026

Reply to benchmark request: explain GPU hardware requirement for example 07 #401

Draft

mawad-amd approved these changes Feb 26, 2026

View reviewed changes

ci: trigger copilot setup workflow on PR @copilot comments

418a23f

add issue_comment trigger with created, edited gate job to PR comments containing @copilot keep manual workflow_dispatch target self-hosted runner labels: copilot, apptainer, iris

Copilot AI mentioned this pull request Feb 26, 2026

Reply to benchmark run request for example 07_gemm_all_scatter #402

Draft

Merge branch 'main' into JoseSantosAMD/copilot_enabled_runner

d3d3156

Copilot AI mentioned this pull request Feb 26, 2026

Run example 07 GEMM all-scatter benchmark across 10 shapes on MI300X #404

Draft

-git config --global --add safe.directory '*'
+if [ -n "${RUNNER_WORKDIR:-}" ]; then
+    git config --global --add safe.directory "${RUNNER_WORKDIR}"
+fi

	WORK_DIR="${WORK:-/work1/amd/josantos}"
	WORK_DIR="${WORK:-$HOME}"

		echo "Found existing runner configuration, removing..."
		./config.sh remove --token "$REGISTRATION_TOKEN" \|\| true

-    echo "Found existing runner configuration, removing..."
-    ./config.sh remove --token "$REGISTRATION_TOKEN" || true
+    echo "Found existing runner configuration, requesting remove token..."
+    PRE_REMOVE_RESPONSE=$(curl -s -X POST \
+        -H "Authorization: token $GITHUB_TOKEN" \
+        -H "Accept: application/vnd.github.v3+json" \
+        "https://api.github.com/repos/$GITHUB_REPOSITORY/actions/runners/remove-token")
+    if command -v jq >/dev/null 2>&1; then
+        PRE_REMOVE_TOKEN=$(echo "$PRE_REMOVE_RESPONSE" | jq -r .token)
+    else
+        PRE_REMOVE_TOKEN=$(echo "$PRE_REMOVE_RESPONSE" | grep -o '"token":"[^"]*"' | head -1 | cut -d'"' -f4)
+    fi
+    if [ "$PRE_REMOVE_TOKEN" != "null" ] && [ -n "$PRE_REMOVE_TOKEN" ]; then
+        echo "Removing existing runner with remove token..."
+        if ! ./config.sh remove --token "$PRE_REMOVE_TOKEN"; then
+            echo "Warning: Failed to remove existing runner with remove token; deleting local .runner config."
+            rm -f ".runner"
+        fi
+    else
+        echo "Warning: Could not obtain remove token for existing runner; deleting local .runner config."
+        rm -f ".runner"
+    fi


		set -e

		WORK_DIR="${WORK:-/work1/amd/josantos}"

Conversation

JoseSantosAMD commented Feb 26, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

JoseSantosAMD commented Feb 26, 2026

Uh oh!

Copilot AI commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

mawad-amd commented Feb 26, 2026

Uh oh!

Copilot AI commented Feb 26, 2026

Uh oh!

mawad-amd left a comment

Choose a reason for hiding this comment

Uh oh!

JoseSantosAMD commented Feb 26, 2026

Uh oh!

Copilot AI commented Feb 26, 2026

Uh oh!

JoseSantosAMD commented Feb 26, 2026

Uh oh!

Copilot AI commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants