Skip to content

Verify indirect deps reachable on incremental run#20735

Open
ilevkivskyi wants to merge 2 commits intopython:masterfrom
ilevkivskyi:fix-indirect-unreachable
Open

Verify indirect deps reachable on incremental run#20735
ilevkivskyi wants to merge 2 commits intopython:masterfrom
ilevkivskyi:fix-indirect-unreachable

Conversation

@ilevkivskyi
Copy link
Member

@ilevkivskyi ilevkivskyi commented Feb 4, 2026

Fixes #19477

This is another surprising correctness hole in the indirect dependencies logic. We implicitly assume everywhere that indirect dependencies are subset of transitive ones. Although it is true when the indirect dependencies are initially calculated, there is no guarantee this will stay true after an incremental update where some import structure has changed.

Currently, we only handle the situations where an indirect dependency is removed from the graph completely, but if it is still in the graph, but now brought in by a different dependent, this may cause out-of-order cache loading, and thus a bug or a crash. It is interesting that although in theory it is kind of a big deal, in practice it is very hard to create such scenario. So far I found only two complicated scenarios (see tests).

Implementation note: although this additional check is needed for rare edge cases, it is quite computationally expensive. I tried few options (including only computing import deltas), but the only viable/robust option seems to be an honest recursive check with some caching. I tested this on torch (it has ~1000 small SCCs plus single SCC with around 1000 modules), and the performance penalty is negligible there. So hopefully it should be fine even for large code bases.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@ilevkivskyi
Copy link
Member Author

Ping on this one.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, must I'm a little worried by performance impact in pathological cases with bad dependency graphs. Maybe it's not going to be a problem, I'd like to at least think a bit about potentially more efficient approaches.


def verify_transitive(self, from_scc_id: int, to_scc_id: int) -> bool:
"""Verify that one SCC is a (transitive) dependency of another."""
if (from_scc_id, to_scc_id) in self.verify_transitive_cache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: use := with .get to avoid constructing tuple twice and performing lookup twice?

)

def verify_transitive(self, from_scc_id: int, to_scc_id: int) -> bool:
"""Verify that one SCC is a (transitive) dependency of another."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a name like is_transitive_dep would be clearer, or is_transitive_scc_dep?

results,
)

def verify_transitive(self, from_scc_id: int, to_scc_id: int) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be the cache be O(n**2) in size in the worst case? I wonder if there is a more efficient algorithm if that's the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It will be quadratic in number of SCCs in worst case. I optimized for time vs space. It is a tough tradeoff. I considered few other things:

  • Using SCCs processed so far: not correct because we may accidentally process unrelated SCCs, thus causing false negatives.
  • Include indirect deps when ordering SCCs: not correct for roughly the same reason.
  • Do not use caching: horrible time performance O(V * E), i.e. O(n**3) worst case.
  • Track imports delta, i.e. only track which dependencies changed since last build: non-trivial and error-prone, there are too many edge cases to consider.

Feel free to think about this, but I would prefer to not postpone this for long, this is a correctness issue that can cause crashes. So far we have been lucky that this didn't come often, bu there is a risk that some popular library will change their import structure hitting this issue.

In the meantime I will explore a bit more, maybe there are some niche algorithms for our specific situation, where in 99.9% there is a path in DAG between two points.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have an idea: we can track whether SCCs are "dependency-fresh" (even if not interface-fresh), i.e. as we process them, check if all direct dependencies are the same as from cache. Essentially check if import delta is "strongly" zero, and then only fall back to full check if something changed. This will not help the worst case, but will help a lot with average case, where one simply edits some files without much changes to import structure.

Copy link
Member Author

@ilevkivskyi ilevkivskyi Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: not "strongly" zero, but "strongly" positive (some extra new deps will do no harm, we just need to be sure we didn't loose any).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Startup error: occasional "AssertionError: Cannot find component 'SSL' for 'OpenSSL.SSL.Context'"

2 participants