Stitched AES-GCM for aarch64 by brian-pane · Pull Request #165 · ctz/graviola

brian-pane · 2026-04-10T22:54:59Z

No description provided.

codspeed-hq · 2026-04-10T22:58:35Z

Merging this PR will not alter performance

✅ 155 untouched benchmarks

_{Comparing brian-pane:aarch64-aes-gcm (29e452c) with main (9685008)}

codecov · 2026-04-10T22:59:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.76%. Comparing base (9685008) to head (29e452c).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #165   +/-   ##
=======================================
  Coverage   99.76%   99.76%           
=======================================
  Files         184      184           
  Lines       50832    50918   +86     
=======================================
+ Hits        50714    50800   +86     
  Misses        118      118

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

brian-pane · 2026-04-10T23:09:36Z

I made this as an attempt to speed up AES-GCM for larger input sizes on Arm (context: issue #163).

In my testing with cargo bench on a Mac M4 system, it helps a little bit:

aes128-gcm/aws-lc-rs/32B time:     [53.615 ns 53.731 ns 53.852 ns]
aes128-gcm/graviola/32B (main):    [90.274 ns 90.482 ns 90.703 ns]
aes128-gcm/graviola/32B (this PR): [90.605 ns 90.960 ns 91.409 ns]

aes128-gcm/aws-lc-rs/2KB time:     [242.13 ns 242.96 ns 243.86 ns]
aes128-gcm/graviola/2KB (main):    [299.82 ns 300.59 ns 301.45 ns]
aes128-gcm/graviola/2KB (this PR): [288.97 ns 289.53 ns 290.12 ns]

aes128-gcm/aws-lc-rs/8KB time:     [799.89 ns 803.41 ns 809.65 ns]
aes128-gcm/graviola/8KB (main):    [983.82 ns 990.83 ns 1.0001 µs]
aes128-gcm/graviola/8KB (this PR): [925.98 ns 928.91 ns 933.72 ns]

aes128-gcm/aws-lc-rs/16KB time:     [1.5428 µs 1.5448 µs 1.5466 µs]
aes128-gcm/graviola/16KB (main):    [1.8859 µs 1.8889 µs 1.8921 µs]
aes128-gcm/graviola/16KB (this PR): [1.7605 µs 1.7626 µs 1.7645 µs]

brian-pane · 2026-04-11T02:59:42Z

+    // Reverse the order of the bytes in each of the two 64-bit lanes in `u`.
+    let u = vrev64q_u8(u);
+    let u = vreinterpretq_u64_u8(u);
+
+    // Swap the locations of the two 64-bit lanes to finish reversing the bytes.
+    let lane0 = vgetq_lane_u64(u, 0);
+    let lane1 = vgetq_lane_u64(u, 1);
+    let reversed = vsetq_lane_u64(lane0, u, 1);
+    vsetq_lane_u64(lane1, reversed, 0)


This is slow, but I haven't figured out a better alternative yet. I tried doing a shuffle operation, similar to what the x86_64 version does:

const SHUFFLE_MAP: uint8x16_t = unsafe { mem::transmute([15u8, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) }; let mut reversed: uint8x16_t; unsafe { core::arch::asm!( "tbl {reversed:v}.16B, {{ {u:v}.16B }}, {map:v}.16B", reversed = out(vreg) reversed, u = in(vreg) u, map = in(vreg) SHUFFLE_MAP, ); } vreinterpretq_u64_u8(reversed)

but that ran even slower.

brian-pane · 2026-04-11T15:28:04Z

This patch assumes that the aarch64 target system is little-endian. Does Graviola support ARM running in big-endian mode?

Stitched AES-GCM for aarch64

29e452c

brian-pane commented Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stitched AES-GCM for aarch64#165

Stitched AES-GCM for aarch64#165
brian-pane wants to merge 1 commit intoctz:mainfrom
brian-pane:aarch64-aes-gcm

brian-pane commented Apr 10, 2026

Uh oh!

codspeed-hq bot commented Apr 10, 2026

Uh oh!

codecov bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

brian-pane commented Apr 10, 2026

Uh oh!

brian-pane Apr 11, 2026

Uh oh!

brian-pane commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brian-pane commented Apr 10, 2026

Uh oh!

codspeed-hq bot commented Apr 10, 2026

Merging this PR will not alter performance

Uh oh!

codecov bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

brian-pane commented Apr 10, 2026

Uh oh!

brian-pane Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

brian-pane commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Apr 10, 2026 •

edited

Loading