fix(aggregate): show aliased expr in explain by kumarUjjawal · Pull Request #21739 · apache/datafusion

kumarUjjawal · 2026-04-20T05:12:01Z

Which issue does this PR close?

Closes Aliased aggregation expressions not visible in physical explain output #19685.

Rationale for this change

Physical explain output only showed the alias for aliased aggregates. That made it hard to understand the plan, especially when the aggregate had a filter, explicit RESPECT NULLS, or a custom UDAF display.

What changes are included in this PR?

Show the full aggregate expression in physical explain for user-written aggregate aliases.
Keep internal aliases like count(*) compact in physical explain.
Replace the old hidden metadata approach with an explicit is_internal flag on Alias.
Preserve that flag through planner rewrites, tree rewrites, and proto round-trip.
Add tests for aliased aggregate explain output, including:
- normal aliased aggregates
- quoted aliases
- explicit RESPECT NULLS
- custom human display
- count(*)
- nested internal alias display
Add an upgrade note for the public Alias API change.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes.

Physical explain output is clearer for aliased aggregate expressions.
Alias now has a new is_internal field.

This is a public API change for users who build or pattern match Alias directly. The upgrade guide has been updated with the needed changes.

kumarUjjawal · 2026-04-20T05:18:31Z

cc @pepijnve

kumarUjjawal · 2026-04-20T07:59:22Z

It has much more cases than I realized initially 😢

kumarUjjawal · 2026-04-20T10:44:43Z

~~I am working on another approach which is looking better than this~~

I updated the code with new changes along with pr body

kumarUjjawal · 2026-04-20T12:40:46Z

I tried two approaches for this.

The first approach was by hiding the bit in alias metadata. It fixed the display problem, but it also mixed planner-only state with user metadata. That made the behavior harder to reason about and forced extra handling in places like equality, hashing, serialization, and rewrite logic. In practice, a simple display fix startedaffecting alias identity and metadata flow in too many places.

The current approach adds an explicit internal flag to alias. This is a small API break, but it makes the model much clearer: whether an alias is user written or planner generated is now part of the type itself, not hidden in metadata. That keeps the display logic direct, avoids hidden state leaking into unrelated code paths, and makes future maintenance safer because the intent is visible and compiler checked.

Would love to hear your thoughts @pepijnve

pepijnve · 2026-04-20T13:37:32Z

@kumarUjjawal could you give an example of where/when the internal flag is necessary?

pepijnve · 2026-04-20T13:49:44Z

I think I might have found it in your second commit. Where this change was reverted.

Wouldn't we want the left version though? The physical plan is actually executing count(1), but you don't see that at all in the physical plan even though the logical plan does show it.

pepijnve · 2026-04-20T13:31:49Z

-            .map(|expr| expr.human_display())
+            .map(|expr| {
+                let human_display = expr.human_display();
+                if human_display.is_empty() {


Should human_display be Option<String>?

Yeah I could do that.

pepijnve · 2026-04-20T13:34:40Z

    }

+    #[tokio::test]
+    async fn test_aggregate_explain_shows_aliased_expression() -> Result<()> {


These might be more concise as SLTs

kumarUjjawal · 2026-04-20T16:22:23Z

Wouldn't we want the left version though? The physical plan is actually executing count(1), but you don't see that at all in the physical plan even though the logical plan does show it.

I think the current behavior is intentional.

My thinking:

user-written aliases should inline the aggregate expression
internal planner aliases should stay compact

count() is in the second group, so keeping aggr=[count()] is expected. The bug here is about user aliases like sum(...) as agg disappearing in physical explain, not about exposing every internal rewrite.

So while the physical plan does execute the lowered count(1) form, showing count(1) as count() in explain would expose planner internals and would regress the compact count() output we already preserve elsewhere. If we want physical explain to show lowered forms more generally, I think that should be doable but will require some changes.

kumarUjjawal · 2026-04-20T16:27:22Z

@kumarUjjawal could you give an example of where/when the internal flag is necessary?

A good example could be:

Internally, COUNT() is lowered to count(1) so it can use the normal aggregate path. But the user did not write count(1), they wrote count(). So the planner needs a way to preserve that user-facing name without treating it like a real user alias.

That is what the internal flag is for: it marks aliases that exist only because the planner rewrote the expression.

Without that bit, physical explain cannot tell these two cases apart:

user alias: sum(a) AS total
planner-generated alias: lowered count(1) wrapped as count(*)

Those should display differently. For example, with:

SELECT COUNT(*) AS total_rows FROM t

there are really two alias layers:

internal: count(1) -> count(*)
user: count(*) -> total_rows

The internal flag lets explain show count() as total_rows, instead of either exposing the lowered form count(1) as count() as total_rows or collapsing everything to just total_rows.

pepijnve · 2026-04-21T07:56:07Z

This might just be personal preference speaking, but in a physical explain plan I'm looking for what the engine is actually doing (how is it being executed), not what I wrote as query. It doesn't make sense for me that the logical plan (which is more declarative in nature) shows the lowered version, while the physical plan (which is more imperative) does not. If anything it should be the other way around that logical hides this detail, and physical shows it.

So if I got to choose I would prefer the left side plan of the diff image above rather than the right one.

kumarUjjawal · 2026-04-21T10:53:54Z

This might just be personal preference speaking, but in a physical explain plan I'm looking for what the engine is actually doing (how is it being executed), not what I wrote as query. It doesn't make sense for me that the logical plan (which is more declarative in nature) shows the lowered version, while the physical plan (which is more imperative) does not. If anything it should be the other way around that logical hides this detail, and physical shows it.

So if I got to choose I would prefer the left side plan of the diff image above rather than the right one.

I think I like this approach too, showing lowered expressions in physical explain can be more useful. Let me see how the code looks for this.

github-actions Bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Apr 20, 2026

kumarUjjawal marked this pull request as draft April 20, 2026 07:56

github-actions Bot added documentation Improvements or additions to documentation sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate functions Changes to functions implementation labels Apr 20, 2026

kumarUjjawal marked this pull request as ready for review April 20, 2026 12:34

pepijnve reviewed Apr 20, 2026

View reviewed changes

github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 21, 2026

kumarUjjawal added 3 commits April 21, 2026 19:54

fix(aggregate): show aliased expr in explain

fd15156

Replace the hidden metadata with is_internal flag

33e47c9

show lowered expressions in physical explain

2580bc5

kumarUjjawal force-pushed the fix/aliased_expr_explain branch from 21c1690 to 2580bc5 Compare April 21, 2026 14:31

kumarUjjawal added 3 commits April 22, 2026 09:30

Merge branch 'main' into fix/aliased_expr_explain

c25ea93

relax the unstable EXPLAIN ANALYZE expectation

94d8f7d

updated the TPC-H explain snapshots

35f0591

Conversation

kumarUjjawal commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kumarUjjawal commented Apr 20, 2026

Uh oh!

kumarUjjawal commented Apr 20, 2026

Uh oh!

kumarUjjawal commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumarUjjawal commented Apr 20, 2026

Uh oh!

pepijnve commented Apr 20, 2026

Uh oh!

pepijnve commented Apr 20, 2026

Uh oh!

pepijnve Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

pepijnve Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal commented Apr 20, 2026

Uh oh!

kumarUjjawal commented Apr 20, 2026

Uh oh!

pepijnve commented Apr 21, 2026

Uh oh!

kumarUjjawal commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kumarUjjawal commented Apr 20, 2026 •

edited

Loading

kumarUjjawal commented Apr 20, 2026 •

edited

Loading