Skip to content

Remove fewer Storage calls in CopyProp and GVN#142531

Merged
rust-bors[bot] merged 1 commit intorust-lang:mainfrom
ohadravid:better-storage-calls-copy-prop
Apr 18, 2026
Merged

Remove fewer Storage calls in CopyProp and GVN#142531
rust-bors[bot] merged 1 commit intorust-lang:mainfrom
ohadravid:better-storage-calls-copy-prop

Conversation

@ohadravid
Copy link
Copy Markdown
Contributor

@ohadravid ohadravid commented Jun 15, 2025

View all comments

Modify the CopyProp and GVN MIR optimization passes to remove fewer Storage{Live,Dead} calls, allowing for better optimizations by LLVM - see #141649.

Details

The idea is to use a new MaybeUninitializedLocals analysis and remove only the storage calls of locals that are maybe-uninit when accessed in a new location.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 15, 2025
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Jun 15, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@matthiaskrgr
Copy link
Copy Markdown
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 15, 2025
bors added a commit that referenced this pull request Jun 15, 2025
…try>

Remove fewer Storage calls in `copy_prop`

Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649.

### Details

This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)).

The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access).

When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them.

This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct.

r? tmiasko

since he commented on the issue and pointed to these passes.
@bors
Copy link
Copy Markdown
Collaborator

bors commented Jun 15, 2025

⌛ Trying commit d24d035 with merge ef7d206...

@bors
Copy link
Copy Markdown
Collaborator

bors commented Jun 15, 2025

☀️ Try build successful - checks-actions
Build commit: ef7d206 (ef7d20666974f0dac45b03e051f2e283f9d9f090)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (ef7d206): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.4%] 8
Regressions ❌
(secondary)
0.3% [0.2%, 0.4%] 7
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.2%, 0.4%] 8

Max RSS (memory usage)

Results (primary 0.7%, secondary 3.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.5% [1.8%, 5.0%] 5
Regressions ❌
(secondary)
3.4% [3.4%, 3.4%] 1
Improvements ✅
(primary)
-3.9% [-6.5%, -2.0%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.7% [-6.5%, 5.0%] 8

Cycles

Results (primary -0.6%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [3.8%, 3.8%] 1
Improvements ✅
(primary)
-0.6% [-0.6%, -0.6%] 1
Improvements ✅
(secondary)
-4.1% [-4.1%, -4.1%] 1
All ❌✅ (primary) -0.6% [-0.6%, -0.6%] 1

Binary size

Results (primary 0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.8%] 10
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 5
Improvements ✅
(primary)
-0.2% [-0.8%, -0.0%] 8
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.0% [-0.8%, 0.8%] 18

Bootstrap: 757.399s -> 756.065s (-0.18%)
Artifact size: 372.20 MiB -> 372.12 MiB (-0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 15, 2025
@ohadravid
Copy link
Copy Markdown
Contributor Author

@matthiaskrgr - I updated the impl to stop re-checking once a head is found to be maybe-dead, which should be a bit better

@matthiaskrgr
Copy link
Copy Markdown
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 15, 2025
@bors
Copy link
Copy Markdown
Collaborator

bors commented Jun 15, 2025

⌛ Trying commit 905e968 with merge c0a2949...

bors added a commit that referenced this pull request Jun 15, 2025
…try>

Remove fewer Storage calls in `copy_prop`

Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649.

### Details

This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)).

The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access).

When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them.

This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct.

r? tmiasko

since he commented on the issue and pointed to these passes.
@cjgillot
Copy link
Copy Markdown
Contributor

Should this check happen in Replacer::visit_local, and move the replacement of storage statements to a dedicated cleanup visitor?

@bors
Copy link
Copy Markdown
Collaborator

bors commented Jun 15, 2025

☀️ Try build successful - checks-actions
Build commit: c0a2949 (c0a294957df10fc3880e1677c72c0cf122485509)

@rust-timer

This comment has been minimized.

@ohadravid
Copy link
Copy Markdown
Contributor Author

Should this check happen in Replacer::visit_local

I'm not sure how to make this work: using ResultsCursor requires a &body, but it's not possible to have that while running a MutVisitor since it requires a &mut body.

Is there a different way to do this?

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (c0a2949): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.4%] 9
Regressions ❌
(secondary)
0.3% [0.2%, 0.4%] 7
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.3% [0.2%, 0.4%] 9

Max RSS (memory usage)

Results (primary -0.1%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.2% [3.4%, 5.8%] 4
Regressions ❌
(secondary)
3.1% [3.1%, 3.1%] 1
Improvements ✅
(primary)
-4.4% [-6.6%, -1.8%] 4
Improvements ✅
(secondary)
-5.8% [-5.8%, -5.8%] 1
All ❌✅ (primary) -0.1% [-6.6%, 5.8%] 8

Cycles

Results (secondary -1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.3% [2.3%, 2.3%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.6% [-2.6%, -2.5%] 2
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.8%] 10
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 5
Improvements ✅
(primary)
-0.2% [-0.8%, -0.0%] 8
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) -0.0% [-0.8%, 0.8%] 18

Bootstrap: 756.494s -> 757.685s (0.16%)
Artifact size: 372.15 MiB -> 372.11 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 15, 2025
Comment thread compiler/rustc_mir_transform/src/copy_prop.rs Outdated
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Apr 15, 2026

💔 Test for af9ddc6 failed: CI. Failed job:

@ohadravid
Copy link
Copy Markdown
Contributor Author

Hi @saethlin looks like the -msvc target has a higher stack usage - how should I modify the test? 🙏

  check:68'0     ~~~~~~~~~~~~~~~~~~~
            101:  subq $48, %rsp 
  check:68'0     ~~~~~~~~~~~~~~~~
  check:68'1      ?               possible intended match

@saethlin
Copy link
Copy Markdown
Member

Codegen test annotations support revisions, and the revision name can be used instead of the CHECK parts of FileCheck comments. You can draw inspiration from this test:

//@ revisions: windows-gnu

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch from d18b665 to 5632001 Compare April 17, 2026 13:56
@rust-bors rust-bors bot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 17, 2026
@ohadravid
Copy link
Copy Markdown
Contributor Author

@saethlin done, split checks to aarch64,x86_64-unknown-linux-gnu,x86_64-pc-windows-msvc.

@saethlin
Copy link
Copy Markdown
Member

@bors try jobs=x86_64-msvc-1

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Apr 18, 2026
…try>

Remove fewer Storage calls in CopyProp and GVN


try-job: x86_64-msvc-1
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Apr 18, 2026

☀️ Try build successful (CI)
Build commit: 1d55920 (1d55920f2709f963b366c744604213872f4741c1, parent: e9e32aca5a4ffd08cbc29547b039d64b92a2c03b)

@saethlin
Copy link
Copy Markdown
Member

@bors r=tmiasko,cjgillot,saethlin rollup=never

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Apr 18, 2026

📌 Commit 5632001 has been approved by tmiasko,cjgillot,saethlin

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 18, 2026
@rust-bors

This comment has been minimized.

@rust-bors rust-bors bot added the merged-by-bors This PR was explicitly merged by bors. label Apr 18, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Apr 18, 2026

☀️ Test successful - CI
Approved by: tmiasko,cjgillot,saethlin
Duration: 3h 4m 55s
Pushing 8da2d28 to main...

@github-actions
Copy link
Copy Markdown
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 7a38981 (parent) -> 8da2d28 (this PR)

Test differences

Show 33 test diffs

Stage 1

  • [assembly] tests/assembly-llvm/issue-141649.rs#aarch64: [missing] -> ignore (only executed when the architecture is aarch64) (J1)
  • [assembly] tests/assembly-llvm/issue-141649.rs#linux-x86_64: [missing] -> pass (J1)
  • [assembly] tests/assembly-llvm/issue-141649.rs#windows-x86_64-msvc: [missing] -> ignore (only executed when the target is x86_64-pc-windows-msvc) (J1)
  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_preserve_head.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_removed_when_local_borrowed.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_twice.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_unreachable.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/copy-prop/issue_141649.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/copy-prop/issue_141649_debug.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/gvn_storage_issue_141649.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/gvn_storage_issue_141649_debug.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/gvn_storage_twice.rs: [missing] -> pass (J1)
  • [mir-opt] tests/mir-opt/gvn_storage_unreachable.rs: [missing] -> pass (J1)
  • [codegen] tests/codegen-llvm/issues/issue-141649.rs: [missing] -> pass (J9)

Stage 2

  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_preserve_head.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_removed_when_local_borrowed.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_twice.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/copy-prop/copy_prop_storage_unreachable.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/copy-prop/issue_141649.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/copy-prop/issue_141649_debug.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/gvn_storage_issue_141649.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/gvn_storage_issue_141649_debug.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/gvn_storage_twice.rs: [missing] -> pass (J0)
  • [mir-opt] tests/mir-opt/gvn_storage_unreachable.rs: [missing] -> pass (J0)
  • [assembly] tests/assembly-llvm/issue-141649.rs#aarch64: [missing] -> pass (J2)
  • [assembly] tests/assembly-llvm/issue-141649.rs#linux-x86_64: [missing] -> ignore (only executed when the target is x86_64-unknown-linux-gnu) (J3)
  • [assembly] tests/assembly-llvm/issue-141649.rs#aarch64: [missing] -> ignore (only executed when the architecture is aarch64) (J4)
  • [assembly] tests/assembly-llvm/issue-141649.rs#linux-x86_64: [missing] -> pass (J5)
  • [assembly] tests/assembly-llvm/issue-141649.rs#windows-x86_64-msvc: [missing] -> ignore (only executed when the target is x86_64-pc-windows-msvc) (J6)
  • [assembly] tests/assembly-llvm/issue-141649.rs#windows-x86_64-msvc: [missing] -> pass (J7)
  • [codegen] tests/codegen-llvm/issues/issue-141649.rs: [missing] -> pass (J8)

Additionally, 2 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 8da2d28cbd5a4e2b93e028e709afe09541671663 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. x86_64-gnu-llvm-22-2: 1h 14m -> 1h 36m (+29.1%)
  2. x86_64-gnu-stable: 2h 28m -> 1h 45m (-28.5%)
  3. x86_64-msvc-ext1: 1h 40m -> 2h 8m (+27.4%)
  4. x86_64-mingw-1: 2h 56m -> 2h 12m (-24.9%)
  5. x86_64-mingw-2: 2h 43m -> 2h 6m (-22.7%)
  6. pr-check-1: 32m 31s -> 26m 16s (-19.2%)
  7. i686-gnu-2: 1h 44m -> 1h 25m (-18.2%)
  8. i686-gnu-nopt-1: 2h 18m -> 1h 56m (-15.7%)
  9. x86_64-gnu-llvm-21-1: 1h 15m -> 1h 5m (-14.0%)
  10. x86_64-rust-for-linux: 49m 46s -> 43m 17s (-13.0%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (8da2d28): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.2% [0.2%, 16.9%] 35
Regressions ❌
(secondary)
0.7% [0.2%, 5.5%] 22
Improvements ✅
(primary)
-0.4% [-0.5%, -0.3%] 2
Improvements ✅
(secondary)
-0.4% [-1.5%, -0.2%] 24
All ❌✅ (primary) 1.1% [-0.5%, 16.9%] 37

Max RSS (memory usage)

Results (primary 0.4%, secondary -5.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.8% [0.5%, 3.4%] 6
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.5% [-3.9%, -3.2%] 2
Improvements ✅
(secondary)
-5.0% [-5.0%, -5.0%] 1
All ❌✅ (primary) 0.4% [-3.9%, 3.4%] 8

Cycles

Results (primary 9.5%, secondary 5.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
9.5% [6.0%, 14.9%] 3
Regressions ❌
(secondary)
5.2% [5.2%, 5.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 9.5% [6.0%, 14.9%] 3

Binary size

Results (primary 0.4%, secondary 0.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 1.2%] 60
Regressions ❌
(secondary)
0.9% [0.0%, 2.5%] 30
Improvements ✅
(primary)
-0.9% [-2.5%, -0.4%] 5
Improvements ✅
(secondary)
-0.5% [-0.6%, -0.5%] 5
All ❌✅ (primary) 0.4% [-2.5%, 1.2%] 65

Bootstrap: 492.147s -> 493.313s (0.24%)
Artifact size: 394.26 MiB -> 394.36 MiB (0.03%)

@ohadravid
Copy link
Copy Markdown
Contributor Author

I think the opt regressions are expected since we do more work, but the debug/check are bad and I think are because I introduced some unintended overhead even in non-opt when the full analysis doesn't run:

// in copyprop
let mut storage_to_remove = DenseBitSet::new_empty(body.local_decls.len()); // extra DenseBitSet
..
for (local, &head) in ssa.copy_classes().iter_enumerated() {
    storage_to_remove.insert(head); // + filling it
}
Replacer { tcx, copy_classes: ssa.copy_classes(), unified, storage_to_remove }
// vs before:
Replacer { tcx, copy_classes: ssa.copy_classes(), unified }

// in gvn
let storage_to_remove = state.reused_locals.clone(); // extra clone
StorageRemover { tcx, reused_locals: state.reused_locals, storage_to_remove }
// vs before:
StorageRemover { tcx, reused_locals: state.reused_locals }

I should be able to fix this by switching to &'a DenseBitSet<Local> in both cases in non-opt runs.

@saethlin
Copy link
Copy Markdown
Member

The regressions are nearly all in opt, and of those that are in debug, the query breakdown doesn't point at CopyProp as the cause.

Opt regressions of this magnitude are rather surprising.

@saethlin
Copy link
Copy Markdown
Member

In the perf report it looks like we added CGUs. Are StorageLive/Dead included in the cgu size estimate?

@ohadravid
Copy link
Copy Markdown
Contributor Author

ohadravid commented Apr 19, 2026

Are StorageLive/Dead included in the cgu size estimate?

Looks like it?

# compiler/rustc_monomorphize/src/partitioning.rs
providers.queries.size_estimate = |tcx, instance| {
    match instance.def {
        // "Normal" functions size estimate: the number of
        // statements, plus one for the terminator.
        InstanceKind::Item(..)
        | InstanceKind::DropGlue(..)
        | InstanceKind::AsyncDropGlueCtorShim(..) => {
            let mir = tcx.instance_mir(instance.def);
            mir.basic_blocks.iter().map(|bb| bb.statements.len() + 1).sum()
        }
        // Other compiler-generated shims size estimate: 1
        _ => 1,
    }
};

and bb.statements.len() includes Storage{Live,Dead}.

Do you think this throws off the CGU calculation and causes the opt regressions?

See results in #155491 (comment)

@saethlin
Copy link
Copy Markdown
Member

Do you think this throws off the CGU calculation and causes the opt regressions?

Some of them, yes. The key is that in an optimized build, some items will get InstantiationMode::LocalCopy which puts a copy of the item into every CGU that references it. So in general, more CGUs means more items that get optimized more times. The minimum amount of instructions executed to compile the program would probably be at 1 CGU, but that would significantly compromise against wall time.

Debug builds can suffer a slightly different problem, where CGUs get merged based on the CGU size estimate, so if the item that's dirtied by the patch that the benchmark suite is applied to is to a very tiny module but the CGU merging process adds it to a larger module, the compile time of that incr-patched scenario is driven by what else is in the CGU that the patched code is merged into.

I'm starting a Zulip topic on this: #t-compiler/performance > Perf regression from retaining more StorageLive/StorageDead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-mir-opt-GVN Area: MIR opt Global Value Numbering (GVN) merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.