fix(debuginfo): correct language detection for LTO-compiled binaries#7
fix(debuginfo): correct language detection for LTO-compiled binaries#7
Conversation
d46e13c to
4769ee7
Compare
bbab72f to
d443c7d
Compare
d443c7d to
52185c9
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
buranmert
left a comment
There was a problem hiding this comment.
i need to do quite a bit of research in order to properly review this, i'll go with approving-blindly 😅
do you expect more changes in symbolic in the future?
do you think we should automate symbolic upgrades in our related services? or manual upgrades are still okay?
This is not an easy PR to review 😄 |
LTO can produce compilation units whose DW_AT_language does not reflect the true source language of the functions they contain. Two cases arise: 1. Artificial LTO CUs (e.g. artificial CU with C++ language tag that contains C functions): a top-level subprogram in such a CU carries a cross-unit DW_AT_abstract_origin pointing to the real CU. We now follow that reference in resolve_function_language to pick up the origin CU's language, which is then used for the symbol-table name, DWARF name, and fallback name of the function. 2. Cross-language LTO inlinees (e.g. a C function inlined into Rust): the inlinee's DW_AT_abstract_origin references the C CU directly. resolve_function_name now reads the referenced CU's language via UnitRef::language() whenever it follows an abstract_origin across a unit boundary, overriding the language supplied by the caller. To propagate the correctly-resolved language to all inlinees of a top-level subprogram, parse_function passes it down through parse_function_children and parse_inlinee. Same-unit abstract_origin references (LTO partial units without a further cross-unit link) keep the enclosing function's language as a fallback, which is correct for the common case where all code in an LTO CU shares the same language.
cc480ea to
173a4d2
Compare
Problem
LTO can produce compilation units (CUs) whose
DW_AT_languagedoes not reflect the true source language of the functions they contain. Two cases arise:Artificial LTO CUs — e.g. a C++ language tag on a CU that contains C functions. A top-level subprogram in such a CU carries a cross-unit
DW_AT_abstract_originpointing to the real (partial) CU.Cross-language LTO inlinees — e.g. a C function inlined into a Rust caller. The inlinee's
DW_AT_abstract_originreferences the C CU directly.Implementation
UnitRef::resolve_entry_language(entry, depth)New helper on
UnitRefthat followsDW_AT_abstract_originchains recursively to find the authoritative CU language:DW_AT_languagefor cross-unit references.MAX_ABSTRACT_ORIGIN_DEPTH = 16levels (matching the limit used by elfutilsdwarf_attr_integrate) to guard against cycles or malformed DWARF.DwarfUnit::resolve_function_language(entry, fallback_language)Thin wrapper around
resolve_entry_languageon the unit'sUnitRef, falling back tofallback_languagewhen no cross-unit language is found (e.g. whenDW_AT_abstract_originpoints to a partial unit with noDW_AT_language).parse_function/parse_inlineeparse_functioncallsresolve_function_languageto resolve the authoritative language for the top-level subprogram and passes it down throughparse_function_childrenandparse_inlinee.parse_inlineecallsresolve_function_languageagain on its own entry so that cross-language inlinees (e.g. C inlined into Rust) override the enclosing function's language correctly.Tests
Two regression tests added with real binary fixtures:
test_lto_language_detection(libjemalloc.so.debug): verifies thatje_tcache_arena_associateandmalloc_mutex_trylock_final(both C functions in a library compiled with LTO) are detected asLanguage::C, notLanguage::Cpp.test_cross_language_lto_inlinee_language(cross_language_lto.debug): verifies thatmy_add(a C function inlined into a Rust binary via cross-language LTO) is detected asLanguage::C, notLanguage::Rust.