Skip to content

🧪 Improve test coverage for get_char_column_simd#94

Merged
bashandbone merged 1 commit intomainfrom
improve-simd-tests-12491095417453404157
Mar 9, 2026
Merged

🧪 Improve test coverage for get_char_column_simd#94
bashandbone merged 1 commit intomainfrom
improve-simd-tests-12491095417453404157

Conversation

@bashandbone
Copy link
Contributor

@bashandbone bashandbone commented Mar 9, 2026

🎯 What: Missing tests for the SIMD-optimized character column calculation in crates/utils/src/simd.rs.

📊 Coverage: Added test cases for:

  • Long UTF-8 and mixed strings to exercise SIMD chunking logic.
  • Consecutive newlines and empty lines.
  • UTF-8 characters crossing SIMD vector width boundaries (16, 32, 64 bytes).
  • Newlines and multi-byte characters at the start of strings.
  • Consistency checks between ASCII and UTF-8 paths.

Result: Significant increase in test coverage and reliability for string processing utilities, ensuring robustness against boundary condition bugs in the SIMD implementation.


PR created automatically by Jules for task 12491095417453404157 started by @bashandbone

Summary by Sourcery

Expand test coverage for SIMD-based character column calculation in string utilities.

Tests:

  • Add tests covering long UTF-8 and mixed ASCII/UTF-8 strings to validate SIMD chunking behavior.
  • Add tests for consecutive newlines and empty-line scenarios to ensure correct column reset logic.
  • Add tests for UTF-8 characters positioned at typical SIMD width boundaries (16, 32, 64 bytes).
  • Add tests for newlines and multi-byte UTF-8 characters at the start of strings to verify correct initial column handling.

Added test cases for long UTF-8 strings, consecutive newlines,
SIMD boundary conditions, and edge cases at the start of strings.
Verified consistency between ASCII and UTF-8 execution paths.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 9, 2026 17:18
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Mar 9, 2026

Reviewer's Guide

Adds focused unit tests for get_char_column_simd to improve coverage of UTF-8 handling, newline behavior, and SIMD boundary conditions without changing production code.

Sequence diagram for get_char_column_simd UTF-8 boundary test

sequenceDiagram
    actor TestRunner
    participant TestModule as GetCharColumnSimdTests
    participant UtilsSimd

    TestRunner->>TestModule: test_get_char_column_utf8_boundary()
    loop for width in [16, 32, 64]
        TestModule->>UtilsSimd: get_char_column_simd(text, width + 3)
        UtilsSimd-->>TestModule: column = width
        TestModule->>UtilsSimd: get_char_column_simd(text, width + 5)
        UtilsSimd-->>TestModule: column = width + 2
    end
    TestModule-->>TestRunner: assertions passed
Loading

Class diagram for new get_char_column_simd tests

classDiagram
    class UtilsSimd {
        +get_char_column_simd(text: &str, offset: usize) usize
    }

    class GetCharColumnSimdTests {
        +test_get_char_column_long_utf8()
        +test_get_char_column_consecutive_newlines()
        +test_get_char_column_utf8_boundary()
        +test_get_char_column_newline_at_start()
        +test_get_char_column_utf8_at_start()
    }

    GetCharColumnSimdTests ..> UtilsSimd : uses
Loading

File-Level Changes

Change Details Files
Add tests covering long UTF-8 and mixed ASCII/UTF-8 strings to exercise SIMD chunking behavior.
  • Add test that repeats a multi-byte emoji to validate character column computation at the end of long UTF-8 input.
  • Add test that repeats a mixed ASCII-plus-emoji pattern to verify combined character counting across long inputs.
crates/utils/src/simd.rs
Add tests for newline edge cases including consecutive newlines and newline at the start of the string.
  • Add test verifying column resets correctly across consecutive newline characters and on subsequent non-newline characters.
  • Add test verifying column calculation when the string starts with a newline and subsequent characters on the first non-empty line.
crates/utils/src/simd.rs
Add tests for UTF-8 characters positioned at SIMD-relevant byte boundaries.
  • Add parameterized test placing a multi-byte UTF-8 character so that it crosses 16, 32, and 64-byte boundaries and assert correct column for offsets after the character and into following ASCII text.
  • Add test verifying correct column offsets when a multi-byte UTF-8 character appears at the start of the string followed by ASCII characters.
crates/utils/src/simd.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR increases reliability of the SIMD-based get_char_column_simd utility by adding targeted tests that exercise UTF-8 handling and tricky boundary conditions in crates/utils.

Changes:

  • Add long UTF-8 and mixed ASCII/UTF-8 test cases to better exercise chunked processing behavior.
  • Add tests for consecutive newlines, newline-at-start, and UTF-8-at-start offsets.
  • Add boundary-focused tests around common SIMD-width byte boundaries (16/32/64) to validate correctness when multi-byte characters straddle those boundaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bashandbone bashandbone merged commit 841e571 into main Mar 9, 2026
30 of 32 checks passed
@bashandbone bashandbone deleted the improve-simd-tests-12491095417453404157 branch March 9, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants