feat: implement templating system based on tstrings#409
Open
NickCrews wants to merge 29 commits intoduckdb:mainfrom
Open
feat: implement templating system based on tstrings#409NickCrews wants to merge 29 commits intoduckdb:mainfrom
NickCrews wants to merge 29 commits intoduckdb:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements a SQL templating system for duckdb-python based on Python t-strings. Discussion started in #370
It is still a WIP, but it is now good enough that I think the shape of the API is 90% there. I wanted to open up this PR so @evertlammerts can take a look at it and give some high level comments. Once we iron the large scale things out then I can fix those up and do more polishing before I ask you for a more detailed review.
Example Usage
Simple bound parameters
This is one of the most basic use cases, but I imagine also would be one of the more popular:
More complex bound params
By default, we generate IDs for the anonymous params.
If you want more control over the naming, you can explicitly create a Param object.
which is just a simple dataclass:
You can also create this with the
duckdb.param()factory function:Builtin duck objects are interpolated correctly
Types from the duckdb package, such as DuckdbPyRelation, datatype constants, expressions, etc, are converted to SQL and params correctly. This allows you to easily build up analysis chains, for easily commenting out individual lines or re-ordering, common to analysis:
A template is just sequence of interleaving str's and param-ish things
Using the t-string literal syntax of
t"foo{bar}"is simply just syntactic sugar for python 3.14+.For older versions of python, or for programmatic construction, or for any other reason, you can create a SqlTemplate object with the
duckdb.template()function:Build Higher-Order components using
__duckdb_template__()We define a Protocol that makes an implementor become compatible with
duckdb.tempate():Here is an example usage where someone can define
There is a well-defined lifecycle of IntoSqlTemplate -> SqlTemplate -> ResolvedSqlTemplate -> CompiledSql
Most users will just go straight from
IntoSqlTemplate(anything thatduckdb.template()understands) straight to executing, but the intermediate processing is well defined, and a public API, and the user can get in the middle of the process and view/modify the intermediate data structures.An
IntoSqlTemplateisn't actually a concrete type, but is the typing union of all the things that can get turned into aSqlTemplate, eg all the things accepted by theduckdb.tempate()function.A
SqlTemplateis what you get back from theduckdb.template()function. It is quite similar to the actualstring.templatelib.Templatebuilt in to python 3.14. It contains a sequence of str's and Interpolation objects, just like string.templatelib.Template. This is potentially nested, with the Interpolation objects containing other SqlTemplates, Params, str, or anything else. It contains an additional.resolve()method that recursively resolves all the inner Interpolations into str's and Params, resulting in aResolvedSqlTemplate.A
ResolvedSqlTemplateis semantically aSequence[str | Param]. The actual final param IDs haven't been resolved yet, but any nesting has been flattened. You can call theSqlTemplate.compile()method to combine adjacent str's and to resolve the param IDs to their final form, resulting in aCompiledSqlobject.The final object is a
CompiledSqlobject, which is just a simple dataclass likeCompiledSql(sql: str, params: dict[str, object]). This gets used asconn.execute(compiled.sql, compiled.params).Behavior notes:
__duckdb_template__()protocol accept unused kwarg args for future-proofing. If in the future we decide that we need to pass config or other metadata during the compilation step, we can start passing that in, and all the handlers won't break.Query API integration
The main SQL entry points (
sql,query,from_query,execute,executemany) now accept template-ish inputs (SqlTemplate/CompiledSql) in addition to existing string/statement inputs. This behavior could use some careful thought, as this is one of the biggest ways we are painting ourselves into a corner for future API changes, or we cause footguns from unexpected behavior. Everyone already uses these APIs, and always will..sql/.params, those are consumed directly..compile(), it is compiled before statement parsing.This allows passing compiled/template objects directly into execution/query paths without requiring manual
.sql/.paramsunpacking by users.I'm not sure if we should be EVEN more coercive, and support accepting any of the things that
template()accepts. Eg should we acceptconn.sql(["SELECT * FROM users WHERE id = ", 123])?Testing
Adds broad coverage across: