Skip to content

refactor(common): modularize Js UDF handling with plan an exec phases#1900

Draft
LNSD wants to merge 2 commits intomainfrom
lnsd/refactor-common-attach-js-udfs
Draft

refactor(common): modularize Js UDF handling with plan an exec phases#1900
LNSD wants to merge 2 commits intomainfrom
lnsd/refactor-common-attach-js-udfs

Conversation

@LNSD
Copy link
Contributor

@LNSD LNSD commented Mar 3, 2026

Separate UDF lifecycle into plan-time placeholders and exec-time bindings so the
V8 isolate pool is only needed at execution, not during schema resolution.

  • Add udfs::plan::PlanJsUdf and udfs::exec::ExecJsUdf to split planning from execution
  • Introduce ExecContext::attach() to rewrite plan UDFs into executable ones in a single traversal
  • Move IsolatePool from AmpCatalogProvider into ExecContext
  • Restructure datasets-derived functions into functions.rs with builder pattern
  • Remove js-runtime dependency from admin-api and datasets-derived

@LNSD LNSD self-assigned this Mar 3, 2026
@LNSD LNSD marked this pull request as draft March 3, 2026 23:31
@LNSD LNSD changed the title refactor(common): modularize JavaScript UDF handling with planning an exec phases refactor(common): modularize JavaScript UDF handling with pan an exec phases Mar 3, 2026
@LNSD LNSD changed the title refactor(common): modularize JavaScript UDF handling with pan an exec phases refactor(common): modularize Js UDF handling with plan an exec phases Mar 3, 2026
@LNSD LNSD force-pushed the lnsd/refactor-common-attach-js-udfs branch from 60a7b4b to 5325a07 Compare March 3, 2026 23:32
@LNSD LNSD force-pushed the lnsd/refactor-common-attach-js-udfs branch from 5325a07 to 9181fc8 Compare March 3, 2026 23:53
@LNSD LNSD force-pushed the lnsd/refactor-common-attach-js-udfs branch 2 times, most recently from e70a4ff to 2c2b6d1 Compare March 4, 2026 14:16
LNSD added 2 commits March 4, 2026 16:40
Decouple `datasets-derived` from `ScalarUDF` and `js-runtime` by moving Arrow type validation to the deserialization boundary, so a successfully deserialized `Function` is always valid.

- Move `Function`/`FunctionSource` from `datasets-common` into `datasets-derived` with custom `Deserialize` that validates Arrow types against JS UDF-supported primitives
- Remove `js-runtime` dependency from `datasets-derived`; `Dataset::function_by_name` returns `&Function` instead of constructing `ScalarUDF`
- Simplify `SelfSchemaProvider::from_manifest_udfs` by removing redundant `schema_name` parameter

Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
Decouple IsolatePool from catalog resolution by splitting JS UDF handling into planning and execution phases, moving runtime resource attachment to service initialization.

- Introduce `udfs` module with `PlanJsUdf` (planning) and `ExecJsUdf` (execution) types
- Remove `IsolatePool` from `AmpCatalogProvider`, pass it at service/worker level
- Move JS UDF attach logic from `DetachedLogicalPlan` into `ExecContext`
- Enhance `Function` type validation with separate input/output type checks
- Denormalize `FunctionSource` fields into `PlanJsUdf` for direct field access

Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
@LNSD LNSD force-pushed the lnsd/refactor-common-attach-js-udfs branch from 2c2b6d1 to e6fb678 Compare March 4, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant