Add vectorized spherely.polygons() constructor#114
Add vectorized spherely.polygons() constructor#114thodson-usgs wants to merge 5 commits intobenbovy:mainfrom
Conversation
Introduce a numpy-native batch polygon constructor that takes an (N, K, 2) array of (longitude, latitude) shell coordinates and an optional per-polygon list of (H, K_h, 2) hole arrays. Avoids the per-polygon Python list parsing that dominates when building many polygons of uniform shape (e.g. rectilinear grid cells), and provides the natural entry point for future GIL-release / parallel work. The signature mirrors create_polygon (shell, holes, oriented) but is vectorized and returns a (N,) object array of POLYGON Geographies. Per-ring validation is elided to match the scalar create_polygon path; invalid rings still surface through the polygon-level validity check. Refs benbovy#52.
- Extract make_s2loop_from_points as the shared core; make_s2loop now delegates to it with a materialized S2Point vector. Eliminates the duplicate loop-init / debug-override / normalize code that was in the first revision's make_s2loop_from_ring helper. - polygons() now builds each ring's std::vector<S2Point> inline and calls make_s2loop_from_points(check=false) directly, dropping the template callable indirection. - Reserve the loops vector when holes are present (1 + n_holes). - Drop the redundant explicit ndim() == 3 check on shells; pybind11's unchecked<3>() raises a ValueError with a clear message. Update the shape-error test to match. - Trim narrating / duplicate comments.
Two independent optimizations: 1. Add `check: bool = True` kwarg to polygons(). When False, skip the per-polygon S2Polygon::IsValid() check (threaded through as a new parameter on make_s2polygon). On valid-by-construction inputs such as rectilinear grid cells, this halves the build cost. 2. Hoist the per-ring std::vector<S2Point> scratch buffers out of the inner loop so vertex allocation is amortized across polygons. Refactor make_s2loop_from_points to take points by const reference; the caller is now responsible for dropping the closed-ring duplicate vertex before the call. Removes the silent-mutation API smell the pass-by-ref variant had. Measured on an M-series MacBook with rectilinear (K=4) quads: | n | scalar | check=True | check=False | |----------|-----------|------------|-------------| | 10_000 | 22.0 ms | 16.8 ms | 8.3 ms | | 100_000 | 241.5 ms | 185.5 ms | 94.7 ms | | 500_000 | 1307 ms | 973 ms | 533 ms | ~2.5x over the scalar create_polygon loop with check=False.
|
@thodson-usgs thanks a lot for the PR! This is certainly functionality we want (we might want to think about the exact API and naming, since we also have FWIW I think it would also be fine to leave out holes for now (if that is not something you need)
I think enabling validation by default is indeed the sensible default behaviour. For naming, I think I might like
As mentioned above, I would also be fine with punting on holes for now. At least in shapely, I don't think that is used very often, but there we actually require holes to be an ndarray, so that makes that every polygon needs to have a uniform number of holes. The more flexible variant you added here is probably more useful, although I think the typical use case for |
|
Thanks for the feedback! I'm working with Claude to do some profiling on another project, and it honed in on a two issues in |
- Rename the validation kwarg from check to validate (matching the convention jorisvandenbossche suggested; also reads more naturally in the context of IsValid()). - Drop the holes parameter entirely. The vectorized polygons() call is aimed at uniform-shape batches like grid cells, which don't need holes; callers with holes can still reach for create_polygon in a Python loop. Simplifies the hot loop (no per-polygon holes.has_value branch, no scratch buffer for hole rings). - Docstring now calls out the two limitations vs create_polygon (uniform vertex count, no holes). - Fold test_polygons_vectorized_matches_create_polygon into test_polygons_vectorized_basic and drop the holes tests. Perf unchanged (1.3x validate=True, 2.5x validate=False on the same 10k/100k/500k rectilinear benchmark). 40/40 creation tests pass.
Summary
Adds
spherely.polygons(shells, oriented=False, validate=True)— a numpy-native vectorized POLYGON constructor that accepts an(N, K, 2)array of shell coordinates and returns a(N,)object array of POLYGON Geographies.This is the batch-construction counterpart to the existing
spherely.points(coords)and is the lane for building many polygons of uniform shape (e.g. rectilinear grid cells) without per-polygon Python call overhead.Motivated by workloads that construct large batches of polygons with uniform ring shape — e.g. spherical regridding over a lat/lon grid — where the Python-side call overhead dominates even though the geometry itself is trivial. Refs #52.
What's in this PR
src/creation.cpppolygons(...)function andm.def("polygons", ...)binding.make_s2loop_from_points(const std::vector<S2Point>&, check, oriented)as the shared loop-construction core; the existingmake_s2looptemplate now delegates to it. Callers drop the closed-ring duplicate vertex before calling (no silent mutation).make_s2polygongains an optionalcheckparameter that gates theS2Polygon::IsValid()call. Defaulttruepreserves all existing behavior.polygons()reuses a singlestd::vector<S2Point>scratch buffer across polygons so the per-ring vertex allocation is amortized.src/spherely.pyi— type stub.tests/test_creation.py— 6 tests covering shape/dtype, WKT equivalence withcreate_polygon, closed-ring auto-closing,oriented=True, shape errors, invalid-ring error path, andvalidate=Falseskipping validation.Limitations vs
create_polygonCalled out explicitly in the docstring:
(N, K, 2)input shape).create_polygonin a Python loop if you need them. Dropped per reviewer suggestion; the grid-cell use case this PR is built around doesn't need them, and the uniform-ndarray API is a bad fit for variable hole counts anyway.Performance
Measured on an M-series MacBook with rectilinear (K=4) quads, best of 5 runs:
validate=True: ~1.3× vs the scalar[create_polygon(s) for s in shells]loop.validate=False: ~2.5× vs scalar, ~2× vsvalidate=True.S2Polygon::IsValid()is roughly half the per-polygon cost; skipping it is safe when the input is valid by construction (grid cells, pre-validated geometries).Test plan
tests/test_creation.py— 40/40).create_polygon(folded into the basic test), orientation, closed rings, the invalid-ring error path, shape-error paths, andvalidate=False.black,clang-format, pre-commit clean on changed files.Review feedback addressed (commit 587e494)
check→validate(@jorisvandenbossche).holesparameter and its implementation.Perf numbers above are re-measured on the post-review commit; no change to the advertised speedups.