Skip to content

fix(debuginfo): correct language detection for LTO-compiled binaries#7

Merged
nsavoire merged 3 commits intodd_masterfrom
nsavoire/lto_language
Mar 18, 2026
Merged

fix(debuginfo): correct language detection for LTO-compiled binaries#7
nsavoire merged 3 commits intodd_masterfrom
nsavoire/lto_language

Conversation

@nsavoire
Copy link
Copy Markdown

@nsavoire nsavoire commented Feb 24, 2026

Problem

LTO can produce compilation units (CUs) whose DW_AT_language does not reflect the true source language of the functions they contain. Two cases arise:

  1. Artificial LTO CUs — e.g. a C++ language tag on a CU that contains C functions. A top-level subprogram in such a CU carries a cross-unit DW_AT_abstract_origin pointing to the real (partial) CU.

  2. Cross-language LTO inlinees — e.g. a C function inlined into a Rust caller. The inlinee's DW_AT_abstract_origin references the C CU directly.

Implementation

UnitRef::resolve_entry_language(entry, depth)

New helper on UnitRef that follows DW_AT_abstract_origin chains recursively to find the authoritative CU language:

  • Recurses into the referenced entry first to handle multi-level chains
  • If no deeper reference yields a language, falls back to the referenced CU's own DW_AT_language for cross-unit references.
  • Limits recursion to MAX_ABSTRACT_ORIGIN_DEPTH = 16 levels (matching the limit used by elfutils dwarf_attr_integrate) to guard against cycles or malformed DWARF.

DwarfUnit::resolve_function_language(entry, fallback_language)

Thin wrapper around resolve_entry_language on the unit's UnitRef, falling back to fallback_language when no cross-unit language is found (e.g. when DW_AT_abstract_origin points to a partial unit with no DW_AT_language).

parse_function / parse_inlinee

parse_function calls resolve_function_language to resolve the authoritative language for the top-level subprogram and passes it down through parse_function_children and parse_inlinee. parse_inlinee calls resolve_function_language again on its own entry so that cross-language inlinees (e.g. C inlined into Rust) override the enclosing function's language correctly.

Tests

Two regression tests added with real binary fixtures:

  • test_lto_language_detection (libjemalloc.so.debug): verifies that je_tcache_arena_associate and malloc_mutex_trylock_final (both C functions in a library compiled with LTO) are detected as Language::C, not Language::Cpp.

  • test_cross_language_lto_inlinee_language (cross_language_lto.debug): verifies that my_add (a C function inlined into a Rust binary via cross-language LTO) is detected as Language::C, not Language::Rust.

@nsavoire nsavoire force-pushed the nsavoire/lto_language branch 4 times, most recently from d46e13c to 4769ee7 Compare February 25, 2026 10:17
@nsavoire nsavoire changed the base branch from upstream_master to dd_master February 25, 2026 10:18
@nsavoire nsavoire force-pushed the nsavoire/lto_language branch 3 times, most recently from bbab72f to d443c7d Compare February 25, 2026 12:34
@nsavoire nsavoire changed the title Attempt to get language from DW_AT_abstract_origin fix(debuginfo): correct language detection for LTO-compiled binaries Feb 25, 2026
@nsavoire nsavoire marked this pull request as ready for review February 25, 2026 12:45
@nsavoire nsavoire requested review from a team and buranmert February 25, 2026 12:45
@nsavoire nsavoire force-pushed the nsavoire/lto_language branch from d443c7d to 52185c9 Compare February 25, 2026 12:48
Comment thread symbolic-debuginfo/src/dwarf.rs Outdated
Comment thread symbolic-debuginfo/src/dwarf.rs
Comment thread symbolic-debuginfo/src/dwarf.rs Outdated
Comment thread symbolic-debuginfo/src/dwarf.rs
Comment thread symbolic-debuginfo/src/dwarf.rs
Comment thread symbolic-debuginfo/src/dwarf.rs Outdated
@nsavoire
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@buranmert buranmert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need to do quite a bit of research in order to properly review this, i'll go with approving-blindly 😅

do you expect more changes in symbolic in the future?
do you think we should automate symbolic upgrades in our related services? or manual upgrades are still okay?

@nsavoire
Copy link
Copy Markdown
Author

nsavoire commented Mar 3, 2026

i need to do quite a bit of research in order to properly review this, i'll go with approving-blindly 😅

do you expect more changes in symbolic in the future? do you think we should automate symbolic upgrades in our related services? or manual upgrades are still okay?

This is not an easy PR to review 😄
I don't expect more changes, so I think we can keep with manual update for now.
Currently dd_master is in bad shape and CI do not pass. I plan to first update dd_master to latest symbolic release (12.17.2) and merge my PR on top of it.

LTO can produce compilation units whose DW_AT_language does not reflect
the true source language of the functions they contain. Two cases arise:

1. Artificial LTO CUs (e.g. artificial CU with C++ language tag that
contains C functions):
   a top-level subprogram in such a CU carries a cross-unit
   DW_AT_abstract_origin pointing to the real CU. We now follow that
   reference in resolve_function_language to pick up the origin CU's
   language, which is then used for the symbol-table name, DWARF name,
   and fallback name of the function.

2. Cross-language LTO inlinees (e.g. a C function inlined into Rust):
   the inlinee's DW_AT_abstract_origin references the C CU directly.
   resolve_function_name now reads the referenced CU's language via
   UnitRef::language() whenever it follows an abstract_origin across a
   unit boundary, overriding the language supplied by the caller.

To propagate the correctly-resolved language to all inlinees of a
top-level subprogram, parse_function passes it down through
parse_function_children and parse_inlinee. Same-unit abstract_origin
references (LTO partial units without a further cross-unit link) keep
the enclosing function's language as a fallback, which is correct for
the common case where all code in an LTO CU shares the same language.
@nsavoire nsavoire force-pushed the nsavoire/lto_language branch from cc480ea to 173a4d2 Compare March 3, 2026 14:43
Comment thread symbolic-debuginfo/src/dwarf.rs Outdated
@nsavoire nsavoire merged commit 3cf2422 into dd_master Mar 18, 2026
10 of 11 checks passed
@nsavoire nsavoire deleted the nsavoire/lto_language branch March 18, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants