Skip to content

trunc: Use an assembly implementation on i586#1152

Open
tgross35 wants to merge 1 commit intorust-lang:mainfrom
tgross35:i586-trunc-asm
Open

trunc: Use an assembly implementation on i586#1152
tgross35 wants to merge 1 commit intorust-lang:mainfrom
tgross35:i586-trunc-asm

Conversation

@tgross35
Copy link
Copy Markdown
Contributor

The trunc implementation uses integer operations so currently works
fine on i586. However, we already have the other three easy operations
based on frndint, so add trunc and complete the set.

ci: skip-extensive

@tgross35
Copy link
Copy Markdown
Contributor Author

Based on #1142 to avoid conflicts.

@rustbot

This comment has been minimized.

@tgross35 tgross35 force-pushed the i586-trunc-asm branch 3 times, most recently from e37a067 to 5aa3f86 Compare April 3, 2026 21:30
@rustbot

This comment has been minimized.

@quaternic
Copy link
Copy Markdown
Contributor

I don't think we should have this just for the sake of consistency. It has strictly worse performance.

@rustbot

This comment has been minimized.

@tgross35
Copy link
Copy Markdown
Contributor Author

tgross35 commented Apr 7, 2026

I don't think we should have this just for the sake of consistency. It has strictly worse performance.

What makes this slow - is frndint that much slower than soft ops? Looking at https://rust.godbolt.org/z/s6GjGsf6E I have no idea whether the latency of 100 is correct or just a worst case estimate since I can't find it cited anywhere.

Any idea why LLVM inserts the wait and why eax gets pushed? Thought that should be caller-saved.

The `trunc` implementation uses integer operations so currently works
fine on i586. However, we already have the other three easy operations
based on `frndint`, so add `trunc` and complete the set.
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 7, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@quaternic
Copy link
Copy Markdown
Contributor

What makes this slow - is frndint that much slower than soft ops? Looking at https://rust.godbolt.org/z/s6GjGsf6E I have no idea whether the latency of 100 is correct or just a worst case estimate since I can't find it cited anywhere.

I'll admit I only tested by comparing against the x86-64 implementation. The 32-bit code does look more complex.

Measuring frndint latency (on https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)):

  • |x| in [0, 2^63) -> ~21 cycles
  • |x| in [2^63, Inf) -> ~40 cycles
  • Inf -> ~230 cycles
  • NaN -> ~250 cycles

Looks like Agner Fog does provide measurements for x87 instructions too, see 4. Instruction tables in:
https://www.agner.org/optimize/#manuals

For Nehalem the listed latency for frndint is 22, so I'll assume the measurements don't consider potential input dependence, but AMD seems to have had a faster implementation since 2011. Intel does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants