Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
11a4960
chore: add `.clang-format`
voltjia Feb 4, 2026
25de6c8
feat: add `DataType`
voltjia Feb 4, 2026
3e1bb6f
test: add an example for `DataType`
voltjia Feb 5, 2026
96127b3
feat: add `Tensor`
voltjia Feb 4, 2026
9a5077b
test: add an example for `Tensor`
voltjia Feb 5, 2026
91daa44
feat: add `Device`
voltjia Feb 5, 2026
fb016f2
feat: add `Handle`
voltjia Feb 5, 2026
128e739
feat: add `Operator`
voltjia Feb 5, 2026
92a02b0
feat: add `Gemm`
voltjia Feb 6, 2026
75b4d99
feat: add `Operator<Gemm, Device::kNvidia>`
voltjia Feb 6, 2026
08fc189
test: add an example for `Operator<Gemm, Device::kNvidia>`
voltjia Feb 6, 2026
c590c2a
feat: add `DataType::FromString` for string-to-dtype conversion
voltjia Feb 10, 2026
fb434a1
refactor: make `Device` a `class` and integrate it into `Tensor`
voltjia Feb 10, 2026
6453be9
feat: add `Device::TypeFromString` for string-to-device-type conversion
voltjia Feb 11, 2026
66f1d3c
feat: add a script to generate pybind11 bindings
voltjia Feb 11, 2026
cb5ccbc
fix: add `virtual ~Operator() = default;`
voltjia Feb 11, 2026
d5c1067
refactor: Simplify `operator()` dispatching
voltjia Feb 11, 2026
45c3e62
feat: add naive support for single-stage interfaces
voltjia Feb 11, 2026
dbe3f4c
feat: add stream handling
voltjia Feb 11, 2026
ba533cd
feat: extend Device enum and refactor GEMM support
Ziminli Feb 10, 2026
8da1d18
feat: add generic dispatcher, compile-time traits/constructs and CPU …
Ziminli Feb 11, 2026
b029358
fix: fix dispatcher default to kCpu issue, various naming issues and …
Ziminli Feb 12, 2026
3ab24bd
refactor: further simplify `blasGemmEx()`, unify comment formatting a…
Ziminli Feb 12, 2026
64ce184
fix: fix the typo for `cudaMemset()` in `runtime_api.h`
Ziminli Feb 12, 2026
3df4832
feat: add `Device::ToString`
voltjia Feb 12, 2026
ecf030e
feat: use `Device::ToString` in `Tensor::ToString`
voltjia Feb 12, 2026
1f871cb
feat: use lowercase words in `Device::kDeviceToDesc` and `kDescToDevice`
voltjia Feb 12, 2026
9de33b3
fix: update `scripts/generate_wrappers.py` to adapt to the latest cha…
voltjia Feb 12, 2026
41af1dc
feat: add support for legacy c code generation
voltjia Feb 12, 2026
632dea2
fix: remove unintended white space in `DeviceTypeFromString`
voltjia Feb 12, 2026
94a9cf2
fix: use `op_name.lower()` in `_generate_call`
voltjia Feb 12, 2026
186071a
fix: add a constructor to `Operator<Gemm, Device::Type::kCpu>` to sup…
voltjia Feb 12, 2026
d6b725b
feat: add operator searching to `scripts/generate_wrappers.py`
voltjia Feb 12, 2026
87de397
build: add CMake build system and README (#2)
Ziminli Feb 13, 2026
2a5ab4f
fix: remove the `*` after `n`
voltjia Feb 13, 2026
b5b6136
build: rename `USE_` options to `WITH_` for backend selection
voltjia Feb 25, 2026
e5b3aea
build: add pybind11 support to generate python bindings
voltjia Feb 25, 2026
eea1bdb
build: add `GENERATE_PYTHON_BINDINGS` option to `CMakeLists.txt`
voltjia Feb 25, 2026
ee7999a
docs: document `GENERATE_PYTHON_BINDINGS` in `README.md`
voltjia Feb 25, 2026
1d07c8d
feat: unify runtime API for CPU backend in `examples/runtime_api.h`
voltjia Feb 25, 2026
8651326
test: add an example for Python binding generation
voltjia Feb 25, 2026
e371867
fix: return `strides_[index]` instead of `shape_[index]` in `stride`
voltjia Feb 25, 2026
61cfe66
refactor: improve GEMM stride handling
voltjia Feb 25, 2026
783373c
fix: update `Blas` to use `trans_a` and `trans_b` parameters
voltjia Feb 26, 2026
ec8a99a
feat: add negative indexing support to `Tensor`
voltjia Feb 26, 2026
f936f71
feat: add batched GEMM support
voltjia Feb 26, 2026
d73a0b7
fix: rename `swapped_a_and_b_` to `swap_a_and_b_`
voltjia Feb 26, 2026
ef07165
test: add basic testing infrastructure
voltjia Feb 26, 2026
f42ca61
test: add test cases for `ops.gemm`
voltjia Feb 26, 2026
5367b7a
test: add `tests/__init__.py`
voltjia Feb 26, 2026
761912e
build: configure Python packaging
voltjia Feb 26, 2026
03c372b
build: move dependencies to `pyproject.toml`
voltjia Feb 26, 2026
1af292c
build: add support for automatically detecting available devices
voltjia Feb 26, 2026
c96ad60
feat: auto-detect system include paths in `scripts/generate_wrappers.py`
voltjia Feb 27, 2026
60b47f1
feat: add the implementation of `Add` operator on CPU, NVIDIA, and Me…
Ziminli Feb 27, 2026
5fcd645
feat(gemm-iluvatar): add Iluvatar GEMM backend support (#3)
zhangyue207 Feb 28, 2026
4cc0f00
test: centralize Act/Assert logic
voltjia Feb 28, 2026
fd70894
test: use `pytest.mark.auto_act_and_assert` in `tests/test_add.py` an…
voltjia Feb 28, 2026
fd800cb
test: centralize `dtype` and `device` parametrization
voltjia Mar 2, 2026
54458fb
test: reorder `pytest.mark.parametrize` decorators in `tests/test_gem…
voltjia Mar 2, 2026
5448259
test: rename operands to `input`, `other`, and `out` in `tests/test_a…
voltjia Mar 2, 2026
92bbe8a
test: add benchmarking support
voltjia Mar 2, 2026
a3f6101
perf: cache `Operator` instances in `Operator::call`
voltjia Mar 2, 2026
2e5cbc4
fix: fix invalid string literal assertion in `Operator::make`
voltjia Mar 2, 2026
a7f447c
test: use `clone_strided` in `_clone` to preserve tensor layout
voltjia Mar 2, 2026
45d8b9d
refactor: move pybind11 utilities from `scripts/generate_wrappers.py`…
voltjia Mar 2, 2026
23950e1
refactor: move tensor conversion logic from `scripts/generate_wrapper…
voltjia Mar 2, 2026
7cf4f85
fix: use `g++` instead of `clang++` in `_get_system_include_flags`
voltjia Mar 3, 2026
049de3a
feat(ops): add `RmsNorm` with Iluvatar, NVIDIA, CPU backends and fp16…
zhangyue207 Mar 4, 2026
59031f7
feat(ops): add Iluvatar GPU backend for `Add` (#8)
zhangyue207 Mar 4, 2026
a6d915b
refactor: adapt dispatcher for full C++17 compatibility and support `…
Ziminli Mar 5, 2026
0256d48
refactor: introduce handle and workspace (#13)
voltjia Mar 5, 2026
24cc11a
feat: add the implementation of `Gemm` operator on Cambricon (#7)
bitzyz Mar 5, 2026
ea78d15
fix: include `"tensor.h"` instead of `"data_type.h"` and `"device.h"`…
voltjia Mar 5, 2026
a671e3a
feat(ops): implement `CausalSoftmax` operator with CPU and CUDA backe…
zhangyue207 Mar 6, 2026
8442eff
feat: support casting and CPU bfloat16 and float16 (#11)
Ziminli Mar 6, 2026
42f1e20
feat: add `swiglu` op with NVIDIA and CPU backends (#10)
bitzyz Mar 6, 2026
6e93d39
feat: reorganize casting utilities and enhance CPU support (#16)
zhangyue207 Mar 11, 2026
71fc388
fix: add equality operators and `CacheKey` `struct` (#18)
zhangyue207 Mar 12, 2026
d094e10
feat(gemm-moore): add Moore GEMM backend support (#14)
gongchensu Mar 17, 2026
2650cd9
feat: optimize `BLOCK_SIZE` for CUDA kernels and support Iluvatar `Sw…
zhangyue207 Mar 18, 2026
9de2fd2
fix: filter out unsupported integer data types in `tests/test_add.py`…
zhangyue207 Mar 19, 2026
dc9f440
feat(moore): add Moore backend for `Add` (#26)
gongchensu Mar 19, 2026
f0fccb1
feat(moore): add Moore SwiGLU (#24)
gongchensu Mar 19, 2026
1b0b5ac
feat(ops): add MetaX `causal_softmax` (#27)
gongchensu Mar 20, 2026
f44be6f
feat(ops): add MetaX backend for `RmsNorm` (#25)
gongchensu Mar 20, 2026
61fcdf7
feat(ops): add MetaX backend for `Swiglu` (#28)
gongchensu Mar 20, 2026
3557dda
feat: develop CI infrastructure (#21)
zhangyue207 Mar 25, 2026
d6b5fd5
chore: ignore ci-results/ directory
zhangyue207 Mar 25, 2026
56f3330
Revert "chore: ignore ci-results/ directory"
zhangyue207 Mar 26, 2026
8c92b2e
feat: add Cambricon `RMSNorm` (#19)
bitzyz Mar 26, 2026
f17e37c
feat: add high-level `DispatchFunc()` interface for multi-type and mi…
Ziminli Mar 26, 2026
2816b58
refactor: make data type mappings and shared CUDA headers device-awar…
voltjia Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
386 changes: 386 additions & 0 deletions .ci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,386 @@
# .ci — CI Images and Pipeline

```
.ci/
├── config.yaml # Unified config (images, jobs, agent definitions)
├── utils.py # Shared utilities (load_config, normalize_config, get_git_commit)
├── agent.py # Runner Agent (scheduler, webhooks, remote dispatch)
├── build.py # Image builder
├── run.py # CI pipeline runner (Docker layer)
├── ci_resource.py # GPU/memory detection and allocation
├── github_status.py # GitHub Commit Status reporting
├── images/
│ ├── nvidia/Dockerfile
│ ├── iluvatar/Dockerfile
│ ├── metax/Dockerfile
│ ├── moore/Dockerfile
│ ├── cambricon/Dockerfile
│ └── ascend/Dockerfile
└── tests/ # Unit tests
├── conftest.py
├── test_agent.py
├── test_build.py
├── test_run.py
├── test_resource.py
├── test_github_status.py
└── test_utils.py
```

**Prerequisites**: Docker, Python 3.10+, `pip install pyyaml`

---

## Configuration `config.yaml`

Config uses a **platform-centric** top-level structure. Each platform defines its image, platform-level defaults, and job list.
At load time, jobs are flattened to `{platform}_{job}` format (e.g., `nvidia_gpu`).

```yaml
repo:
url: https://github.com/InfiniTensor/InfiniOps.git
branch: master

github:
status_context_prefix: "ci/infiniops"

agents: # Remote agent URLs (used by CLI for cross-machine dispatch)
nvidia:
url: http://nvidia-host:8080
iluvatar:
url: http://iluvatar-host:8080

platforms:
nvidia:
image: # Image definition
dockerfile: .ci/images/nvidia/
build_args:
BASE_IMAGE: nvcr.io/nvidia/pytorch:24.10-py3
setup: pip install .[dev] --no-build-isolation
jobs:
gpu: # Flattened as nvidia_gpu
resources:
ngpus: 1 # Scheduler auto-picks this many free GPUs
memory: 32GB
shm_size: 16g
timeout: 3600
stages:
- name: test
run: pytest tests/ -n 8 -v --tb=short --junitxml=/workspace/results/test-results.xml

iluvatar:
image:
dockerfile: .ci/images/iluvatar/
build_args:
BASE_IMAGE: corex:qs_pj20250825
APT_MIRROR: http://archive.ubuntu.com/ubuntu
PIP_INDEX_URL: https://pypi.org/simple
docker_args: # Platform-level docker args, inherited by all jobs
- "--privileged"
- "--cap-add=ALL"
- "--pid=host"
- "--ipc=host"
volumes:
- /dev:/dev
- /lib/firmware:/lib/firmware
- /usr/src:/usr/src
- /lib/modules:/lib/modules
setup: pip install .[dev] --no-build-isolation
jobs:
gpu: # Flattened as iluvatar_gpu
resources:
gpu_ids: "0"
gpu_style: none # CoreX: passthrough via --privileged + /dev mount
memory: 32GB
shm_size: 16g
timeout: 3600
stages:
- name: test
run: pytest tests/ -n 8 -v --tb=short --junitxml=/workspace/results/test-results.xml
```

### Config hierarchy

| Level | Field | Description |
|---|---|---|
| **Platform** | `image` | Image definition (dockerfile, build_args) |
| | `image_tag` | Default image tag (defaults to `latest`) |
| | `docker_args` | Extra `docker run` args (e.g., `--privileged`) |
| | `volumes` | Extra volume mounts |
| | `setup` | In-container setup command |
| | `env` | Injected container env vars |
| **Job** | `resources.ngpus` | Number of GPUs — scheduler auto-picks free ones (NVIDIA only) |
| | `resources.gpu_ids` | Static GPU device IDs (e.g., `"0"`, `"0,2"`) |
| | `resources.gpu_style` | GPU passthrough: `nvidia` (default), `none`, or `mlu` |
| | `resources.memory` | Container memory limit |
| | `resources.shm_size` | Shared memory size |
| | `resources.timeout` | Max run time in seconds |
| | `stages` | Execution stage list |
| | Any platform field | Jobs can override any platform-level default |

---

## Image builder `build.py`

| Flag | Description |
|---|---|
| `--platform nvidia\|iluvatar\|metax\|moore\|ascend\|all` | Target platform (default: `all`) |
| `--commit` | Use specific commit ref as image tag (default: HEAD) |
| `--force` | Skip Dockerfile change detection |
| `--dry-run` | Print commands without executing |

```bash
# Build with change detection (skips if no Dockerfile changes)
python .ci/build.py --platform nvidia

# Build Iluvatar image
python .ci/build.py --platform iluvatar --force

# Force build all platforms
python .ci/build.py --force
```

Build artifacts are stored as local Docker image tags: `infiniops-ci/<platform>:<commit-hash>` and `:latest`.
Proxy and `no_proxy` env vars are forwarded from the host to `docker build` automatically.

> `--push` is reserved for future use; requires a `registry` section in `config.yaml`.

---

## Pipeline runner `run.py`

Platform is auto-detected (via `nvidia-smi`/`ixsmi`/`mx-smi`/`mthreads-gmi`/`cnmon` on PATH), no manual specification needed.

| Flag | Description |
|---|---|
| `--config` | Config file path (default: `.ci/config.yaml`) |
| `--job` | Job name: short (`gpu`) or full (`nvidia_gpu`). Defaults to all jobs for the current platform |
| `--branch` | Override clone branch (default: config `repo.branch`) |
| `--stage` | Run only the specified stage |
| `--image-tag` | Override image tag |
| `--gpu-id` | Override GPU device IDs (nvidia via `--gpus`, others via `CUDA_VISIBLE_DEVICES`) |
| `--test` | Override pytest test path (e.g., `tests/test_gemm.py::test_gemm`) |
| `--results-dir` | Host directory mounted to `/workspace/results` inside the container |
| `--local` | Mount current directory (read-only) instead of cloning from git |
| `--dry-run` | Print docker command without executing |

```bash
# Simplest usage: auto-detect platform, run all jobs, use config default branch
python .ci/run.py

# Specify short job name
python .ci/run.py --job gpu

# Full job name (backward compatible)
python .ci/run.py --job nvidia_gpu

# Run only the test stage, preview mode
python .ci/run.py --job gpu --stage test --dry-run

# Test local uncommitted changes without pushing
python .ci/run.py --local
```

Container execution flow: `git clone` → `checkout` → `setup` → stages.
With `--local`, the current directory is mounted read-only at `/workspace/repo` and copied to a writable temp directory inside the container before setup runs — host files are never modified.
Proxy vars are forwarded from the host. Test results are written to `--results-dir`. Each run uses a clean environment (no host pip cache mounted).

---

## Platform differences

| Platform | GPU passthrough | `gpu_style` | Base image | Detection tool |
|---|---|---|---|---|
| NVIDIA | `--gpus` (NVIDIA Container Toolkit) | `nvidia` (default) | `nvcr.io/nvidia/pytorch:24.10-py3` | `nvidia-smi` |
| Iluvatar | `--privileged` + `/dev` mount | `none` | `corex:qs_pj20250825` | `ixsmi` |
| MetaX | `--privileged` | `none` | `maca-pytorch:3.2.1.4-...` | `mx-smi` |
| Moore | `--privileged` | `none` | `vllm_musa:20251112_hygon` | `mthreads-gmi` |
| Cambricon | `--privileged` | `mlu` | `cambricon/pytorch:v1.25.3` | `cnmon` |
| Ascend | TODO | — | `ascend-pytorch:24.0.0` | — |

`gpu_style` controls the Docker device injection mechanism: `nvidia` uses `--gpus`, `none` uses `CUDA_VISIBLE_DEVICES` (or skips injection for Moore), `mlu` uses `MLU_VISIBLE_DEVICES`.

---

## Runner Agent `agent.py`

The Runner Agent supports CLI manual dispatch, GitHub webhook triggers, resource-aware dynamic scheduling, and cross-machine remote dispatch.

### CLI manual execution

```bash
# Run all jobs (dispatched to remote agents, using config default branch)
python .ci/agent.py run

# Specify branch
python .ci/agent.py run --branch feat/xxx

# Run a specific job
python .ci/agent.py run --job nvidia_gpu

# Filter by platform
python .ci/agent.py run --platform nvidia

# Preview mode
python .ci/agent.py run --dry-run
```

| Flag | Description |
|---|---|
| `--branch` | Test branch (default: config `repo.branch`) |
| `--job` | Specific job name |
| `--platform` | Filter jobs by platform |
| `--commit` | Override commit SHA used for GitHub status reporting |
| `--image-tag` | Override image tag |
| `--dry-run` | Preview mode |

### Webhook server

Deploy one Agent instance per platform machine (platform is auto-detected). On each machine:

```bash
python .ci/agent.py serve --port 8080
```

Additional `serve` flags:

| Flag | Description |
|---|---|
| `--port` | Listen port (default: 8080) |
| `--host` | Listen address (default: `0.0.0.0`) |
| `--webhook-secret` | GitHub webhook signing secret (or `WEBHOOK_SECRET` env var) |
| `--api-token` | `/api/run` Bearer auth token (or `AGENT_API_TOKEN` env var) |
| `--results-dir` | Results directory (default: `ci-results`) |
| `--utilization-threshold` | GPU idle threshold percentage (default: 10) |

| Endpoint | Method | Description |
|---|---|---|
| `/webhook` | POST | GitHub webhook (push/pull_request) |
| `/api/run` | POST | Remote job trigger |
| `/api/job/{id}` | GET | Query job status |
| `/health` | GET | Health check |
| `/status` | GET | Queue + resource status |

Webhook supports `X-Hub-Signature-256` signature verification via `--webhook-secret` or `WEBHOOK_SECRET` env var.

### Remote agent configuration

Configure agent URLs in `config.yaml`; the CLI automatically dispatches remote jobs to the corresponding agents:

```yaml
agents:
nvidia:
url: http://<nvidia-ip>:8080
iluvatar:
url: http://<iluvatar-ip>:8080
metax:
url: http://<metax-ip>:8080
moore:
url: http://<moore-ip>:8080
```

### Resource scheduling

The Agent auto-detects GPU utilization and system memory to dynamically determine parallelism:
- GPU utilization < threshold (default 10%) and not allocated by Agent → available
- When resources are insufficient, jobs are queued automatically; completed jobs release resources and trigger scheduling of queued tasks

### GitHub Status

Set the `GITHUB_TOKEN` env var and the Agent will automatically report commit status:
- `pending` — job started
- `success` / `failure` — job completed

Status context format: `ci/infiniops/{job_name}`

---

## Multi-machine deployment guide

### Per-platform setup

Each machine needs Docker installed, the platform runtime, and the base CI image built.

| Platform | Runtime check | Base image | Build command |
|---|---|---|---|
| NVIDIA | `nvidia-smi` (+ [Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) | `nvcr.io/nvidia/pytorch:24.10-py3` (public) | `python .ci/build.py --platform nvidia` |
| Iluvatar | `ixsmi` | `corex:qs_pj20250825` (import in advance) | `python .ci/build.py --platform iluvatar` |
| MetaX | `mx-smi` | `maca-pytorch:3.2.1.4-...` (import in advance) | `python .ci/build.py --platform metax` |
| Moore | `mthreads-gmi` | `vllm_musa:20251112_hygon` (import in advance) | `python .ci/build.py --platform moore` |

### Start Agent services

On each machine (platform is auto-detected):

```bash
python .ci/agent.py serve --port 8080
```

### Configure remote agent URLs

On the trigger machine, add the `agents` section to `config.yaml` (see [Remote agent configuration](#remote-agent-configuration) above for the format).

### Trigger cross-platform tests

```bash
# Run all platform jobs at once (using config default branch)
python .ci/agent.py run

# Preview mode (no actual execution)
python .ci/agent.py run --dry-run

# Run only a specific platform
python .ci/agent.py run --platform nvidia
```

### Optional configuration

#### GitHub Status reporting

Set the env var on all machines so each reports its own platform's test status:

```bash
export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
```

#### API Token authentication

When agents are exposed on untrusted networks, enable token auth:

```bash
python .ci/agent.py serve --port 8080 --api-token <secret>
# Or: export AGENT_API_TOKEN=<secret>
```

#### GitHub Webhook auto-trigger

In GitHub repo → Settings → Webhooks, add a webhook for each machine:

| Field | Value |
|---|---|
| Payload URL | `http://<machine-ip>:8080/webhook` |
| Content type | `application/json` |
| Secret | Must match `--webhook-secret` |
| Events | `push` and `pull_request` |

```bash
python .ci/agent.py serve --port 8080 --webhook-secret <github-secret>
# Or: export WEBHOOK_SECRET=<github-secret>
```

### Verification checklist

```bash
# 1. Dry-run each machine individually
for platform in nvidia iluvatar metax moore; do
python .ci/agent.py run --platform $platform --dry-run
done

# 2. Health and resource checks
for ip in <nvidia-ip> <iluvatar-ip> <metax-ip> <moore-ip>; do
curl http://$ip:8080/health
curl http://$ip:8080/status
done

# 3. Cross-platform test
python .ci/agent.py run --branch master
```
Loading