float reduce_max/min: fix SNaN treatment#515
Conversation
|
I noticed that e.g. the AVX512 codegen is nearly identical for both (AVX2 is worse). Is the key word here that it may short-circuit the computation, but doesn't have to? |
|
Yes, the LLVM intrinsic is non-deterministic about whether signalling NaNs short-circuit to a NaN result or ignore this lane and continue with the rest. But to match the scalar behavior we need a guarantee that all NaNs, including signalling NaNs, are ignored (i.e. they never short-circuit). |
|
@calebzulawski so, what are your thoughts here? Do you think this is okay to land? I don't think we want to keep around the simd_reduce_min intrinsic in the compiler with its current semantics so I'd prefer to clean this up. Hopefully eventually LLVM will support the semantics we actually want for reduce_min/max. |
|
I'm a little torn because the fold version has pretty significantly worse codegen in some cases, and doing a reduction when you're fairly confident you don't have NaNs is pretty common, but this is a lot easier to use than adding a new intrinsic with nnan that ends up unsafe for users. Plus rustc doesn't really give us the ability to implement a better reduction. But I agree that nan semantics should probably match for elementwise and reduce. |
|
What is the alternative? We could document the intrinsic to be UB or very very non-deterministic for any NaN input and then have two methods in portable-simd, one that matches the scalar version and one that is fast? Doesn't seem that appealing. (Between those two I'd prefer the UB variant. I'd rather not have to implement yet more different min/max NaN semantics in Miri. A no-NaN check is easy.) My hope is that someone will add the missing intrinsic to LLVM before portable-simd stabilizes, but of course it's hard to predict the future. |
|
Hmm, I looked into it again and I can't quite duplicate the bad codegen I saw previously. The only significant difference I see is arm, but the fminnmv instruction has no option to ignore nans, so that makes sense. So I think I'm okay with merging this. Maybe eventually exposing an nnan version that we can combine with a select to make it safe would be useful? |
I assume the reduce operations are meant to match the semantics of the scalar operations -- that's in fact what the tests are checking for. However, it is currently not the case: when an input is SNaN, the LLVM intrinsic that we use for
simd_reduce_maxon floats may entirely short-circuit the computation and make the entire result a NaN. In contrast, the operation used as reference for the tests will skip all SNaN and only consider actual numeric inputs.LLVM currently has no vector-reduce operation that skips SNaN (llvm/llvm-project#185827), so the best we can do at the moment is stop using the intrinsic.
Is there a good place to add a test specifically for SNaN inputs?
Also see rust-lang/rust#153395.