Skip to content

Add docstring examples for Aggregate window functions#1418

Open
ntjohnson1 wants to merge 1 commit intoapache:mainfrom
rerun-io:nick/docstrings-agg-window
Open

Add docstring examples for Aggregate window functions#1418
ntjohnson1 wants to merge 1 commit intoapache:mainfrom
rerun-io:nick/docstrings-agg-window

Conversation

@ntjohnson1
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Add example usage to docstrings for Aggregate window functions to improve documentation.

What changes are included in this PR?

The first PR was basically adding a docstring to everything in functions. I broke it apart into a PR (that already merged) for the infra. I then reviewed and merged an example PR of adding the docstrings in parts. This is now the follow up opening a handful of PRs for the remaining functions in functions.py Everything is co-authored with Claude since I used claude to extend the handwritten examples I wrote for reference and to split apart the large PR rather than doing it manually.

I've reviewed all the code prior to PR.

Are there any user-facing changes?

No

Add example usage to docstrings for Aggregate window functions to improve documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ntjohnson1
Thanks for working on this.

Comment on lines 2518 to +2524
df.aggregate([], first_value(col("a"), order_by="ts"))

Examples:
---------
>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": [10, 20, 30]})
>>> result = df.aggregate([], [dfn.functions.first_value(dfn.col("a")).alias("v")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2518: df.aggregate([], expr)

2524: ...df.aggregate([], [expr])

Mixing both forms inside the same docstring makes the API shape feel less crisp than it could be.

Same observation for first_value, last_value, and nth_value

Comment on lines +3082 to +3092
>>> import builtins
>>> result = df.select(
... dfn.col("a"),
... dfn.functions.cume_dist(
... order_by="a"
... ).alias("cd")
... )
>>> [builtins.round(x, 4) for x in
... result.sort(dfn.col("a")
... ).collect_column("cd").to_pylist()]
[0.6667, 0.6667, 1.0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a simpler example without builtin:

>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": [1, 2, 2, 3]})
>>> result = df.select(cume_dist(col("a"))).collect()[0]
>>> rounded = [round(x, 2) for x in result.column(0)]
>>> rounded
[0.25, 0.75, 0.75, 1.0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants