Skip to content

support insert_to_multi_column#60986

Open
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:dev_0303_2
Open

support insert_to_multi_column#60986
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:dev_0303_2

Conversation

@BiteTheDDDDt
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings March 3, 2026 09:52
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new vectorized column API (insert_to_multi_column) to scatter rows from one source column into multiple destination columns, and uses it to optimize row distribution into exchange channels by directly scattering block columns into each channel’s mutable block before flushing.

Changes:

  • Add IColumn::insert_to_multi_column(...) with a COWHelper default implementation, plus optimized overrides for ColumnVector, ColumnStr (string), and ColumnNullable.
  • Extend BlockSerializer/Channel to support external mutable-block initialization/access and a post-scatter flush path (try_flush_after_scatter).
  • Update ExchangeTrivialWriter::_channel_add_rows to distribute rows via per-column scatter instead of building an _origin_row_idx indirection array.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
be/src/vec/sink/vdata_stream_sender.h Add serializer helpers (ensure/init, flush decision, serialize helper) and expose serializer/flush method on Channel.
be/src/vec/sink/vdata_stream_sender.cpp Implement Channel::try_flush_after_scatter.
be/src/vec/common/cow.h Wire insert_to_multi_column through COWHelper to the base implementation.
be/src/vec/columns/column.h Add new pure virtual API + default impl helper; update column interface surface.
be/src/vec/columns/column_vector.h Declare ColumnVector::insert_to_multi_column.
be/src/vec/columns/column_vector.cpp Implement optimized numeric scatter into multiple destination columns.
be/src/vec/columns/column_string.h Declare ColumnStr::insert_to_multi_column.
be/src/vec/columns/column_string.cpp Implement optimized string scatter (offsets/chars) into multiple destinations.
be/src/vec/columns/column_nullable.h Declare ColumnNullable::insert_to_multi_column.
be/src/vec/columns/column_nullable.cpp Implement nullable scatter by delegating to nested column + null-map scatter.
be/src/pipeline/shuffle/exchange_writer.cpp Use insert_to_multi_column to scatter each input column into channel mutable blocks, then flush channels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +187 to +193
RETURN_IF_CATCH_EXCEPTION({
// Ensure each channel's mutable block is initialized.
// Even EOF channels need a valid mutable block as a dummy destination,
// since insert_to_multi_column scatters unconditionally.
for (size_t i = 0; i < channel_count; ++i) {
channels[i]->serializer().ensure_mutable_block(block);
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert_to_multi_column currently forces ensure_mutable_block() for all channels (including is_receiver_eof() ones). For EOF channels this re-allocates a mutable block after close() has reset it, and since EOF channels are skipped in the flush loops, rows routed to them will accumulate in memory and never be released until query end. Consider routing EOF channels to a per-call dummy MutableBlock/columns (or teaching scatter to ignore nullptr dsts), and only initializing real channel blocks for non-EOF channels that actually receive rows in this batch.

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +232
// positions.size() == this->size(), positions[i] is the index into dsts
// that row i should be inserted into.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API comment mentions positions.size() == this->size(), but the signature now takes a raw positions pointer plus an explicit rows count. Please update the comment to reflect the actual contract (e.g., positions has length rows, and rows may be <= size()).

Suggested change
// positions.size() == this->size(), positions[i] is the index into dsts
// that row i should be inserted into.
// `positions` points to an array of length `rows`, and `rows` is the number of
// source rows to scatter (rows <= this->size()). positions[i] is the index into
// `dsts` that row i should be inserted into.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants