Skip to content

Add vectorized spherely.polygons() constructor#114

Open
thodson-usgs wants to merge 5 commits intobenbovy:mainfrom
thodson-usgs:vectorized-polygons
Open

Add vectorized spherely.polygons() constructor#114
thodson-usgs wants to merge 5 commits intobenbovy:mainfrom
thodson-usgs:vectorized-polygons

Conversation

@thodson-usgs
Copy link
Copy Markdown

@thodson-usgs thodson-usgs commented Apr 20, 2026

Summary

Adds spherely.polygons(shells, oriented=False, validate=True) — a numpy-native vectorized POLYGON constructor that accepts an (N, K, 2) array of shell coordinates and returns a (N,) object array of POLYGON Geographies.

This is the batch-construction counterpart to the existing spherely.points(coords) and is the lane for building many polygons of uniform shape (e.g. rectilinear grid cells) without per-polygon Python call overhead.

Motivated by workloads that construct large batches of polygons with uniform ring shape — e.g. spherical regridding over a lat/lon grid — where the Python-side call overhead dominates even though the geometry itself is trivial. Refs #52.

What's in this PR

  • src/creation.cpp
    • New polygons(...) function and m.def("polygons", ...) binding.
    • Extracted make_s2loop_from_points(const std::vector<S2Point>&, check, oriented) as the shared loop-construction core; the existing make_s2loop template now delegates to it. Callers drop the closed-ring duplicate vertex before calling (no silent mutation).
    • make_s2polygon gains an optional check parameter that gates the S2Polygon::IsValid() call. Default true preserves all existing behavior.
    • polygons() reuses a single std::vector<S2Point> scratch buffer across polygons so the per-ring vertex allocation is amortized.
  • src/spherely.pyi — type stub.
  • tests/test_creation.py — 6 tests covering shape/dtype, WKT equivalence with create_polygon, closed-ring auto-closing, oriented=True, shape errors, invalid-ring error path, and validate=False skipping validation.

Limitations vs create_polygon

Called out explicitly in the docstring:

  • All shells share the same vertex count (enforced by the numpy (N, K, 2) input shape).
  • No holes — use create_polygon in a Python loop if you need them. Dropped per reviewer suggestion; the grid-cell use case this PR is built around doesn't need them, and the uniform-ndarray API is a bad fit for variable hole counts anyway.

Performance

Measured on an M-series MacBook with rectilinear (K=4) quads, best of 5 runs:

n scalar loop vec (validate=True) vec (validate=False)
10,000 21.9 ms 17.1 ms 8.3 ms
100,000 229.2 ms 181.2 ms 92.5 ms
500,000 1202.7 ms 969.0 ms 496.9 ms
  • validate=True: ~1.3× vs the scalar [create_polygon(s) for s in shells] loop.
  • validate=False: ~2.5× vs scalar, ~2× vs validate=True. S2Polygon::IsValid() is roughly half the per-polygon cost; skipping it is safe when the input is valid by construction (grid cells, pre-validated geometries).

Test plan

  • All existing tests pass (full tests/test_creation.py — 40/40).
  • New tests cover shape/dtype, WKT equivalence with create_polygon (folded into the basic test), orientation, closed rings, the invalid-ring error path, shape-error paths, and validate=False.
  • black, clang-format, pre-commit clean on changed files.
  • CI across Linux/Windows/macOS.

Review feedback addressed (commit 587e494)

  • Renamed checkvalidate (@jorisvandenbossche).
  • Dropped the holes parameter and its implementation.
  • Docstring now spells out the uniform-vertex-count + no-holes limitations.
  • Merged the scalar-equivalence test into the basic test.

Perf numbers above are re-measured on the post-review commit; no change to the advertised speedups.

Introduce a numpy-native batch polygon constructor that takes an
(N, K, 2) array of (longitude, latitude) shell coordinates and an
optional per-polygon list of (H, K_h, 2) hole arrays. Avoids the
per-polygon Python list parsing that dominates when building many
polygons of uniform shape (e.g. rectilinear grid cells), and provides
the natural entry point for future GIL-release / parallel work.

The signature mirrors create_polygon (shell, holes, oriented) but is
vectorized and returns a (N,) object array of POLYGON Geographies.
Per-ring validation is elided to match the scalar create_polygon path;
invalid rings still surface through the polygon-level validity check.

Refs benbovy#52.
- Extract make_s2loop_from_points as the shared core; make_s2loop
  now delegates to it with a materialized S2Point vector. Eliminates
  the duplicate loop-init / debug-override / normalize code that was
  in the first revision's make_s2loop_from_ring helper.
- polygons() now builds each ring's std::vector<S2Point> inline and
  calls make_s2loop_from_points(check=false) directly, dropping the
  template callable indirection.
- Reserve the loops vector when holes are present (1 + n_holes).
- Drop the redundant explicit ndim() == 3 check on shells; pybind11's
  unchecked<3>() raises a ValueError with a clear message. Update the
  shape-error test to match.
- Trim narrating / duplicate comments.
Two independent optimizations:

1. Add `check: bool = True` kwarg to polygons(). When False, skip the
   per-polygon S2Polygon::IsValid() check (threaded through as a new
   parameter on make_s2polygon). On valid-by-construction inputs such
   as rectilinear grid cells, this halves the build cost.

2. Hoist the per-ring std::vector<S2Point> scratch buffers out of the
   inner loop so vertex allocation is amortized across polygons.

Refactor make_s2loop_from_points to take points by const reference; the
caller is now responsible for dropping the closed-ring duplicate vertex
before the call. Removes the silent-mutation API smell the pass-by-ref
variant had.

Measured on an M-series MacBook with rectilinear (K=4) quads:

|  n       | scalar    | check=True | check=False |
|----------|-----------|------------|-------------|
|  10_000  |   22.0 ms |    16.8 ms |      8.3 ms |
| 100_000  |  241.5 ms |   185.5 ms |     94.7 ms |
| 500_000  | 1307   ms |   973   ms |    533   ms |

~2.5x over the scalar create_polygon loop with check=False.
@jorisvandenbossche
Copy link
Copy Markdown
Collaborator

@thodson-usgs thanks a lot for the PR!

This is certainly functionality we want (we might want to think about the exact API and naming, since we also have create_polygon(), although polygons() also make sense for consistency with shapely)

FWIW I think it would also be fine to leave out holes for now (if that is not something you need)

check default is True so polygons() has the same safety guarantees as scalar create_polygon. False is explicit opt-in for valid-by-construction inputs. Happy to rename (validate? trusted=False?) if there's a preferred convention.

I think enabling validation by default is indeed the sensible default behaviour. For naming, I think I might like validate=True/False
(I don't think we already expose this anywhere else, right? although it could probably also make sense for other creation methods)

holes representation is a sequence of length N where each entry is either None or an (H_i, K_h, 2) array. This allows variable hole counts per polygon while keeping the common no-holes case cheap. Alternative API shapes welcome.

As mentioned above, I would also be fine with punting on holes for now. At least in shapely, I don't think that is used very often, but there we actually require holes to be an ndarray, so that makes that every polygon needs to have a uniform number of holes. The more flexible variant you added here is probably more useful, although I think the typical use case for polygons (like creating a grid) won't need it?

Comment thread src/creation.cpp
Comment thread tests/test_creation.py Outdated
@thodson-usgs
Copy link
Copy Markdown
Author

Thanks for the feedback! I'm working with Claude to do some profiling on another project, and it honed in on a two issues inspherely. I'll do my best to keep the changes tight and not bug you too much :)

- Rename the validation kwarg from check to validate (matching the
  convention jorisvandenbossche suggested; also reads more naturally
  in the context of IsValid()).
- Drop the holes parameter entirely. The vectorized polygons() call
  is aimed at uniform-shape batches like grid cells, which don't need
  holes; callers with holes can still reach for create_polygon in a
  Python loop. Simplifies the hot loop (no per-polygon holes.has_value
  branch, no scratch buffer for hole rings).
- Docstring now calls out the two limitations vs create_polygon
  (uniform vertex count, no holes).
- Fold test_polygons_vectorized_matches_create_polygon into
  test_polygons_vectorized_basic and drop the holes tests.

Perf unchanged (1.3x validate=True, 2.5x validate=False on the same
10k/100k/500k rectilinear benchmark). 40/40 creation tests pass.
@thodson-usgs thodson-usgs marked this pull request as ready for review April 22, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants