data-parallel patched ALP standalone kernel by a10y · Pull Request #7576 · vortex-data/vortex

a10y · 2026-04-20T19:29:11Z

Summary

Follow up to #7440

This changes ALP execution on CUDA. Previously, we'd execute two kernel passes: one to perform ALP decoding to global memory, and a second to apply patches.

This PR works similarly to prior work to push patching into the decoding kernel for unpacking. We assign a FastLanes 1024-element block to each warp (32 threads), and then perform decoding and patching in a single kernel pass.

Testing

Unit tests were added to check for simple and edge cases (multi-chunk, mix of chunks with/without patches)

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y · 2026-04-20T22:10:56Z

Benchmark improvements VS develop on GH 200

~/vortex$ nvidia-smi
Mon Apr 20 22:09:35 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.148.08             Driver Version: 570.148.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GH200 480GB             On  |   00000000:DD:00.0 Off |                  Off |
| N/A   35C    P0             74W /  700W |       3MiB /  97871MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

data-parallel patched ALP standalone kernel

949acdd

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/alp-patched branch from 9dc91b1 to 949acdd Compare April 20, 2026 19:40

fixup

c844637

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y added the changelog/feature A new feature label Apr 20, 2026

robert3005 reviewed Apr 21, 2026

View reviewed changes

Comment thread vortex-cuda/src/kernel/encodings/alp.rs Outdated

a10y added 4 commits April 21, 2026 08:46

hoist null patches as constant

82b9114

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

assert thread count

1632898

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

import

151e0b5

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

collect to Buffer instead of Vec

9fd5b97

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y marked this pull request as ready for review April 21, 2026 15:33

0ax1 reviewed Apr 21, 2026

View reviewed changes

Comment thread vortex-cuda/kernels/src/alp.cu Outdated

rename

bc78994

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/alp-patched branch from e51a081 to bc78994 Compare April 21, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-parallel patched ALP standalone kernel#7576

data-parallel patched ALP standalone kernel#7576
a10y wants to merge 7 commits intodevelopfrom
aduffy/alp-patched

a10y commented Apr 20, 2026 •

edited

Loading

Uh oh!

a10y commented Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

a10y commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

a10y commented Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

a10y commented Apr 20, 2026 •

edited

Loading