Skip to content

m4: add gcc support#152

Open
becker33 wants to merge 7 commits intomasterfrom
gcc-support-m4
Open

m4: add gcc support#152
becker33 wants to merge 7 commits intomasterfrom
gcc-support-m4

Conversation

@becker33
Copy link
Copy Markdown
Collaborator

@becker33 becker33 commented Apr 28, 2026

For gcc
m1 --> armv8.4-a (available from 8.0)
m2 --> armv8.6-a (available from 10.1)
m3 == m2
m4 --> armv8.6-a+sme2

Signed-off-by: Gregory Becker <becker33@llnl.gov>
Comment thread cpu/microarchitectures.json Outdated
@alalazo alalazo self-assigned this Apr 28, 2026
Signed-off-by: Gregory Becker <becker33@llnl.gov>
Signed-off-by: Gregory Becker <becker33@llnl.gov>
alalazo
alalazo previously approved these changes Apr 28, 2026
Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm you tried a build with Spack, before I merge this? I don't have any access to m4

@becker33
Copy link
Copy Markdown
Collaborator Author

I don't have access either, going to ask @haampie to test. This came up from a conversation he and I were having on slack.

Signed-off-by: Gregory Becker <becker33@llnl.gov>
Signed-off-by: Gregory Becker <becker33@llnl.gov>
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Apr 29, 2026

It seems gcc 15 added support up to -mcpu=apple-m3, see https://gcc.gnu.org/gcc-15/changes.html

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Apr 30, 2026

A quick search gave me this. Wondering if adding +crc+crypto to our baseline gives improved performance.

Apple Chip Architecture First GCC to support -mcpu Generic Fallback Flag (For older GCC) First GCC for Fallback
M1 Armv8.5-A GCC 15.1 (apple-m1) -march=armv8.5-a+crc+crypto GCC 9.1
M2 Armv8.6-A GCC 15.1 (apple-m2) -march=armv8.6-a+crc+crypto GCC 10.1
M3 Armv8.6-A GCC 15.1 (apple-m3) -march=armv8.6-a+crc+crypto GCC 10.1
M4 Armv9.2-A TBD (apple-m4*) -march=armv9.2-a+nosve GCC 14.1

Comment thread cpu/microarchitectures.json Outdated
"gcc": [
{
"versions": "14.1:",
"flags": "-march=armv8.6-a+sme2 -mtune=generic"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the table for another suggestion. For what is worth, it seems that using the armv8.6 flag as above is needed for GCC older than v14

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on the table in the gcc manual pages that these two descriptions are identical, except that the one based on 8.6-a makes explicit that sme2 is included even though sve is not.

See


arch value | Architecture | Includes by default
-- | -- | --
‘armv8-a’   | Armv8-A   | ‘+fp’, ‘+simd’
‘armv8.1-a’ | Armv8.1-A | ‘armv8-a’, ‘+crc’, ‘+lse’, ‘+rdma’
‘armv8.2-a’ | Armv8.2-A | ‘armv8.1-a’
‘armv8.3-a’ | Armv8.3-A | ‘armv8.2-a’, ‘+pauth’
‘armv8.4-a’ | Armv8.4-A | ‘armv8.3-a’, ‘+flagm’, ‘+fp16fml’, ‘+dotprod’
‘armv8.5-a’ | Armv8.5-A | ‘armv8.4-a’, ‘+sb’, ‘+ssbs’, ‘+predres’
‘armv8.6-a’ | Armv8.6-A | ‘armv8.5-a’, ‘+bf16’, ‘+i8mm’
‘armv8.7-a’ | Armv8.7-A | ‘armv8.6-a’, ‘+ls64’
‘armv8.8-a’ | Armv8.8-a | ‘armv8.7-a’, ‘+mops’
‘armv8.9-a’ | Armv8.9-a | ‘armv8.8-a’
‘armv9-a’.  | Armv9-A   | ‘armv8.5-a’, ‘+sve’, ‘+sve2’
‘armv9.1-a’ | Armv9.1-A | ‘armv9-a’, ‘+bf16’, ‘+i8mm’
‘armv9.2-a’ | Armv9.2-A | ‘armv9.1-a’, ‘+ls64’
‘armv9.3-a’ | Armv9.3-A | ‘armv9.2-a’, ‘+mops’
‘armv9.4-a’ | Armv9.4-A | ‘armv9.3-a’
‘armv8-r’   | Armv8-R   | ‘armv8-r’

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking more closely, 9.2-a+nosve would be the same as 8.7-a, not 8.6-a. But the 8.7-a and 9.2-a tables include +wfxt+xs which we don't list as flags on the m4 architecture in archspec, so I'm hesitant to make that change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok after a bunch more reading:

  • There is no way to use gcc to target sme instructions while only allowing sve in streaming mode (the only mode that apple-m4 supports).
  • Apple only promises that their hardware supports the armv8.6-a ISA
  • Based on info from sysctl hw.optional it looks like the m4 does support the flags that distinguish armv8.7-a from armv8.6-a.
  • Apple mostly supports the armv9.2-a ISA, with the exception of SVE non-streaming instructions
  • You can mostly disable SVE non-streaming instructions with -fno-tree-vectorize -fno-tree-slp-vectorize, but that's not a guarantee.

With all that said, I think the best option for us would be to use -march=armv9.2-a+sme2 -fno-tree-vectorize -fno-tree-slp-vectorize. This will do no autovectorization but would allow sme intrinsics to work.

If we want to be 100% safe, we would need to stay with -march=armv8.6-a+wfxt.

I'll update the PR with my recommendation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becker33 JSON LGTM. Feel free to merge if you don't think we need to test this further. I plan to release a new version of archspec soon-ish.

Signed-off-by: Gregory Becker <becker33@llnl.gov>
…tions

Signed-off-by: Gregory Becker <becker33@llnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants