Skip to content

float reduce_max/min: fix SNaN treatment#515

Merged
calebzulawski merged 1 commit intorust-lang:masterfrom
RalfJung:max-min-snan
Apr 16, 2026
Merged

float reduce_max/min: fix SNaN treatment#515
calebzulawski merged 1 commit intorust-lang:masterfrom
RalfJung:max-min-snan

Conversation

@RalfJung
Copy link
Copy Markdown
Member

@RalfJung RalfJung commented Mar 30, 2026

I assume the reduce operations are meant to match the semantics of the scalar operations -- that's in fact what the tests are checking for. However, it is currently not the case: when an input is SNaN, the LLVM intrinsic that we use for simd_reduce_max on floats may entirely short-circuit the computation and make the entire result a NaN. In contrast, the operation used as reference for the tests will skip all SNaN and only consider actual numeric inputs.

LLVM currently has no vector-reduce operation that skips SNaN (llvm/llvm-project#185827), so the best we can do at the moment is stop using the intrinsic.

Is there a good place to add a test specifically for SNaN inputs?

Also see rust-lang/rust#153395.

@calebzulawski
Copy link
Copy Markdown
Member

I noticed that e.g. the AVX512 codegen is nearly identical for both (AVX2 is worse). Is the key word here that it may short-circuit the computation, but doesn't have to?

@RalfJung
Copy link
Copy Markdown
Member Author

Yes, the LLVM intrinsic is non-deterministic about whether signalling NaNs short-circuit to a NaN result or ignore this lane and continue with the rest. But to match the scalar behavior we need a guarantee that all NaNs, including signalling NaNs, are ignored (i.e. they never short-circuit).

@RalfJung
Copy link
Copy Markdown
Member Author

@calebzulawski so, what are your thoughts here? Do you think this is okay to land? I don't think we want to keep around the simd_reduce_min intrinsic in the compiler with its current semantics so I'd prefer to clean this up. Hopefully eventually LLVM will support the semantics we actually want for reduce_min/max.

@calebzulawski
Copy link
Copy Markdown
Member

I'm a little torn because the fold version has pretty significantly worse codegen in some cases, and doing a reduction when you're fairly confident you don't have NaNs is pretty common, but this is a lot easier to use than adding a new intrinsic with nnan that ends up unsafe for users. Plus rustc doesn't really give us the ability to implement a better reduction.

But I agree that nan semantics should probably match for elementwise and reduce.

@RalfJung
Copy link
Copy Markdown
Member Author

RalfJung commented Apr 16, 2026

What is the alternative? We could document the intrinsic to be UB or very very non-deterministic for any NaN input and then have two methods in portable-simd, one that matches the scalar version and one that is fast? Doesn't seem that appealing. (Between those two I'd prefer the UB variant. I'd rather not have to implement yet more different min/max NaN semantics in Miri. A no-NaN check is easy.)

My hope is that someone will add the missing intrinsic to LLVM before portable-simd stabilizes, but of course it's hard to predict the future.

@calebzulawski
Copy link
Copy Markdown
Member

Hmm, I looked into it again and I can't quite duplicate the bad codegen I saw previously. The only significant difference I see is arm, but the fminnmv instruction has no option to ignore nans, so that makes sense. So I think I'm okay with merging this.

Maybe eventually exposing an nnan version that we can combine with a select to make it safe would be useful?

@calebzulawski calebzulawski merged commit 3bd3fff into rust-lang:master Apr 16, 2026
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants