feat: remove ligrec parallelize by selmanozleyen · Pull Request #1125 · scverse/squidpy

selmanozleyen · 2026-02-23T14:51:37Z

main results
pr results

Results compared:

python benchmarks/bench_ligrec.py --compare benchmarks/results


scenario         n_jobs   main (s)     PR (s)  speedup   change
---------------------------------------------------------------
large                 1      2.548      3.865    0.66x  -51.7%
large                 4      1.561      1.259    1.24x +  19.3%
large                 8      1.706      1.122    1.52x +  34.2%
many_perms            1      1.620      2.247    0.72x  -38.7%
many_perms            4      0.924      0.637    1.45x +  31.1%
many_perms            8      0.966      0.647    1.49x +  33.1%
medium                1      0.261      0.425    0.61x  -62.8%
medium                4      0.581      0.183    3.18x +  68.5%
medium                8      0.687      0.173    3.97x +  74.8%
xlarge                1      8.139      9.259    0.88x  -13.8%
xlarge                4     10.498      3.077    3.41x +  70.7%
xlarge                8     10.188      2.768    3.68x +  72.8%

both faster and cleaner code. this removes parallelize.

update: the reason main is faster when n_jobs=1 is because main sets also numba_parallel=True so it's because it's still numba parallel even though it's one process.

codecov · 2026-03-02T11:16:46Z

Codecov Report

❌ Patch coverage is 64.47368% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.90%. Comparing base (6e9a778) to head (59b3472).

Files with missing lines	Patch %	Lines
src/squidpy/gr/_ligrec.py	63.23%	25 Missing ⚠️
src/squidpy/_utils.py	75.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1125      +/-   ##
==========================================
- Coverage   74.05%   73.90%   -0.16%     
==========================================
  Files          39       39              
  Lines        6495     6510      +15     
  Branches     1122     1122              
==========================================
+ Hits         4810     4811       +1     
- Misses       1230     1249      +19     
+ Partials      455      450       -5

Files with missing lines	Coverage Δ
src/squidpy/_utils.py	`57.94% <75.00%> (+0.72%)`	⬆️
src/squidpy/gr/_ligrec.py	`74.18% <63.23%> (-3.41%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

ilan-gold · 2026-03-18T14:06:23Z

src/squidpy/gr/_ligrec.py

        )


+@njit(nogil=True, cache=True)


Why not parallel=True + prange? Because this is being run in a thread pool? Why not just make every individual step parallel?
https://numba.pydata.org/numba-doc/dev/user/parallel.html?highlight=njit#explicit-parallel-loops

Would this require rewriting into a reduction of some sort to prevent overlapping writes?

I see in the benchmarks that the speedups with more jobs is not really scaling linearly, which is not what I would expect.

I see in the benchmarks that the speedups with more jobs is not really scaling linearly, which is not what I would expect.

Thats a good point worth investigating

Why not parallel=True + prange? Because this is being run in a thread pool? Why not just make every individual step parallel?
https://numba.pydata.org/numba-doc/dev/user/parallel.html?highlight=njit#explicit-parallel-loops

Would this require rewriting into a reduction of some sort to prevent overlapping writes?

For keeping the progress bar. Also fits current code more naturally, I agree that it's better and faster but in the sepal PR but this pattern helped me keep the progress bar and avoid duplicate code in a straightforward way. For example I had to create a function that would densify per thread but pushing this step to python reduced the duplicate code.

Two options at a high level:

Refactor for the numba itself to be parallel and work on "batches"

Does https://github.com/mortacious/numba-progress work?

ilan-gold · 2026-03-18T14:13:50Z

src/squidpy/gr/_ligrec.py

+    def _worker(t: int) -> NDArrayA:
+        local_counts = np.zeros((n_inter, n_cpairs), dtype=np.int64)
+        rs = np.random.RandomState(None if seed is None else t + seed)
+        perm = clustering.copy()
+        for _ in range(chunk_sizes[t]):
+            rs.shuffle(perm)
+            _score_permutation(
+                data_arr,
+                perm,
+                inv_counts,
+                mean_obs,
+                interactions,
+                interaction_clusters,
+                valid,
+                local_counts,
+            )
+            pbar.update(1)
+        return local_counts


Why can't this also be numba-ified with an outer-loop of some sort? Why do we still need a thread pool? I thought "one giant kernel" was the goal

Is shuffling not parallelizable? Certainly there are ways around this like argsort + randomindices or somethign? Other than that, I don't really see why therange(chunk_sizes[t]) couldn't be parallelized. Is it the validity of local_counts? Seems like there should be ways around this

to have a responsive progress bar and to have the same shuffling results as old version.

Could you explain a bit more

Why is the "same results" thing a hard blocker? clustering seems small so copy+shuffle should be cheap as a pre-processing step i.e., do all the "shuffle" stuff ahead of time / outside numba

Would you expect a giant kernel to be faster? My gut is "yes" given Severin's experience/our experience with co_occurrence but I'm all ears

clustering seems small so copy+shuffle should be cheap as a pre-processing step i.e.,
perm = clustering.copy()

Copies per thread. To allocate ahead of time you'd need n_perms * len(clustering). So it can blow up if the user give high n_perms. You can also do per thread in place shuffling but that changes the behaviour and results.

Would you expect a giant kernel to be faster? My gut is "yes" given Severin's experience/our experience with co_occurrence but I'm all ears

For sure.

@timtreis How many permutations are realistic? I would the memory hit can't be that high here

Separate from our "internal knowledge", I have to wonder if this is wroth trying out and seeing where it breaks/degrades. If there's a way to do this that doesn't involve changing results but you get more speed at the risk of higher memory usage, I think we should understand that tradeoff

this might be a nice way to retain the progress bar without depending on the linked package:

Instead of explicit threads each working on a "chunk", we would redo the algorithm so that we sequentially proceed over the "chunks" but process each "chunk" at once in using parallelized numba instead of for _ in range(chunk_sizes[t]). This could help alleviate the memory concerns by doing allocations for only clustering * chunk_sizes[t] amount of data instead of the full n_perms * len(clustering).

Do I ahve this right?

I will have a look at these questions by doing some runs. Didn't fully understand some parts but will come back to you. But to be clear: would not reproducing old results be a dealbreaker for you? I didn't understand your stance on this.

But to be clear: would not reproducing old results be a dealbreaker for you? I didn't understand your stance on this.

That's not really my call here. That being said, what I was trying to propose was a way of using numba's parallel=True mechanism without breaking backwards compatible reproducibility. It seems like all the reproducibility concerns boil down to the shuffling, so I'm proposing basically front loading the shuffling so that the kernel can run in parallel on multiple pre-shuffled permutations

ilan-gold · 2026-03-27T17:00:53Z

src/squidpy/gr/_ligrec.py

+    def _worker(t: int) -> NDArrayA:
+        local_counts = np.zeros((n_inter, n_cpairs), dtype=np.int64)
+        rs = np.random.RandomState(None if seed is None else t + seed)
+        perm = clustering.copy()
+        for _ in range(chunk_sizes[t]):
+            rs.shuffle(perm)
+            _score_permutation(
+                data_arr,
+                perm,
+                inv_counts,
+                mean_obs,
+                interactions,
+                interaction_clusters,
+                valid,
+                local_counts,
+            )
+            pbar.update(1)
+        return local_counts


@timtreis How many permutations are realistic? I would the memory hit can't be that high here

Separate from our "internal knowledge", I have to wonder if this is wroth trying out and seeing where it breaks/degrades. If there's a way to do this that doesn't involve changing results but you get more speed at the risk of higher memory usage, I think we should understand that tradeoff

this might be a nice way to retain the progress bar without depending on the linked package:

Instead of explicit threads each working on a "chunk", we would redo the algorithm so that we sequentially proceed over the "chunks" but process each "chunk" at once in using parallelized numba instead of for _ in range(chunk_sizes[t]). This could help alleviate the memory concerns by doing allocations for only clustering * chunk_sizes[t] amount of data instead of the full n_perms * len(clustering).

Do I ahve this right?

ilan-gold · 2026-03-27T17:07:40Z

src/squidpy/gr/_ligrec.py

        )


+@njit(nogil=True, cache=True)


Two options at a high level:

Refactor for the numba itself to be parallel and work on "batches"

Does https://github.com/mortacious/numba-progress work?

ilan-gold · 2026-03-27T17:10:40Z

src/squidpy/gr/_ligrec.py

+
+    def _worker(t: int) -> NDArrayA:
+        local_counts = np.zeros((n_inter, n_cpairs), dtype=np.int64)
+        rs = np.random.RandomState(None if seed is None else t + seed)


Why do we need to use the old RandomState? Can we get the same result by using Generator?

Can we get the same result by using Generator?

I am not sure but isn't this what Phil tried to do in his PR? By using the legacy RandomState internally by default when no rng was given. I think he also said the guarantees for reproducibility is different than RandomState (I verified it and its correct). Like Generator doesn't guarantee reproducibility across different versions of numpy.

selmanozleyen force-pushed the feat/remove-ligrec-parallelize branch from 9fe8f25 to 4a60ef3 Compare March 2, 2026 11:02

selmanozleyen mentioned this pull request Mar 2, 2026

chore: ligrec add explicit reproducibility tests #1132

Open

refactor ligrec: replace parallelize with threading + numba nogil

6230aed

selmanozleyen force-pushed the feat/remove-ligrec-parallelize branch from d1f752c to 6230aed Compare March 11, 2026 14:00

pre-commit-ci bot and others added 5 commits March 11, 2026 14:01

[pre-commit.ci] auto fixes from pre-commit.com hooks

2de8fd5

for more information, see https://pre-commit.ci

Merge branch 'main' into feat/remove-ligrec-parallelize

fc62c07

undo extract_adata

4cdf709

undo changes

6ec9571

checkout main

d6324d5

selmanozleyen changed the title ~~Feat/remove ligrec parallelize~~ feat: remove ligrec parallelize Mar 11, 2026

selmanozleyen marked this pull request as ready for review March 11, 2026 14:25

selmanozleyen requested a review from timtreis March 11, 2026 14:25

remove old test

df46993

selmanozleyen marked this pull request as draft March 11, 2026 14:28

selmanozleyen removed the request for review from timtreis March 11, 2026 14:28

also deprecate backend

2f999dc

selmanozleyen marked this pull request as ready for review March 11, 2026 14:55

selmanozleyen requested review from ilan-gold and timtreis March 11, 2026 14:57

selmanozleyen and others added 7 commits March 16, 2026 21:43

Merge branch 'main' into feat/remove-ligrec-parallelize

c7c775e

[pre-commit.ci] auto fixes from pre-commit.com hooks

3a62d7d

for more information, see https://pre-commit.ci

update conf

8f8a192

use _get_n_cores

0c1add0

use thread_map

d8dc655

[pre-commit.ci] auto fixes from pre-commit.com hooks

fb5c2e7

for more information, see https://pre-commit.ci

update threadmap

165544d

ilan-gold reviewed Mar 18, 2026

View reviewed changes

ilan-gold mentioned this pull request Mar 20, 2026

perf: parallel downsample scverse/scanpy#4004

Draft

Merge branch 'main' into feat/remove-ligrec-parallelize

59b3472

ilan-gold reviewed Mar 27, 2026

View reviewed changes

		)


		@njit(nogil=True, cache=True)

		)


		@njit(nogil=True, cache=True)

Conversation

selmanozleyen commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

selmanozleyen Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

selmanozleyen Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

selmanozleyen Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

selmanozleyen commented Feb 23, 2026 •

edited

Loading

codecov bot commented Mar 2, 2026 •

edited

Loading

selmanozleyen Mar 26, 2026 •

edited

Loading

selmanozleyen Mar 18, 2026 •

edited

Loading

selmanozleyen Mar 27, 2026 •

edited

Loading