[PySpark] feat: add explode and explode_outer to experimental PySpark API by tinovyatkin · Pull Request #415 · duckdb/duckdb-python

tinovyatkin · 2026-04-03T08:38:14Z

Summary

Implement explode(col) and explode_outer(col) collection functions in the experimental PySpark-compatible API
explode maps to DuckDB's unnest() function, which natively drops rows with NULL/empty arrays (matching PySpark semantics)
explode_outer preserves rows with NULL/empty arrays by substituting [NULL] via a CaseExpression before unnesting
Added 5 tests covering basic usage, NULL/empty handling, Column object input, and the outer variant

Implementation details

explode(col) is a one-liner wrapping FunctionExpression("unnest", ...), since DuckDB's unnest already matches PySpark's explode behavior for arrays (drops NULL/empty).

explode_outer(col) builds a CaseExpression that replaces NULL or empty arrays with [NULL] before passing to unnest, so those rows appear in the output with a NULL value instead of being dropped.

Both functions follow the existing patterns used by other collection functions like flatten, array_compact, and array_remove in functions.py.

Not included (future work)

posexplode / posexplode_outer — these produce multiple output columns (pos + value), which requires multi-column generator support beyond the current Column abstraction
Map input support — DuckDB's unnest doesn't accept MAP types directly; this would require map_entries() wrapping and struct field extraction

Implement the `explode` and `explode_outer` collection functions for DuckDB's experimental PySpark-compatible API. These are commonly used PySpark functions for flattening array columns into individual rows. - `explode(col)` maps to DuckDB's `unnest()`, dropping NULL/empty arrays - `explode_outer(col)` preserves NULL/empty array rows by substituting `[NULL]` before unnesting via a CASE expression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tinovyatkin changed the title ~~Add explode and explode_outer to experimental PySpark API~~ [PySpark]: feat: add explode and explode_outer to experimental PySpark API Apr 3, 2026

tinovyatkin changed the title ~~[PySpark]: feat: add explode and explode_outer to experimental PySpark API~~ [PySpark] feat: add explode and explode_outer to experimental PySpark API Apr 3, 2026

tinovyatkin marked this pull request as ready for review April 3, 2026 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PySpark] feat: add explode and explode_outer to experimental PySpark API#415

[PySpark] feat: add explode and explode_outer to experimental PySpark API#415
tinovyatkin wants to merge 1 commit intoduckdb:mainfrom
tinovyatkin:feat/spark-explode

tinovyatkin commented Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tinovyatkin commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation details

Not included (future work)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tinovyatkin commented Apr 3, 2026 •

edited

Loading