Skip to content

Document and test use of char::is_whitespace in collapsible_if#16840

Open
iAmChidiebereh wants to merge 8 commits intorust-lang:masterfrom
iAmChidiebereh:test/confirm-is_whitespace-intent
Open

Document and test use of char::is_whitespace in collapsible_if#16840
iAmChidiebereh wants to merge 8 commits intorust-lang:masterfrom
iAmChidiebereh:test/confirm-is_whitespace-intent

Conversation

@iAmChidiebereh
Copy link
Copy Markdown

@iAmChidiebereh iAmChidiebereh commented Apr 10, 2026

changelog: none

This PR addresses the whitespace check in collapsible_if.

// ./clippy_lints/src/collapsible_if.rs:145

let requires_space = snippet(cx, up_to_else, "..").ends_with(|c: char| !c.is_whitespace());

My investigation revealed that the current char::is_whitespace check is actually intentional. If we switched to rustc_lexer::is_whitespace, zero-width characters (like \u{200E}) would be treated as valid spacing, which would make Clippy to output suggestions that visibly look like elseif. To prevent future regressions, this PR adds a UI test with a zero-width space and an inline comment explaining why we must keep the current check.

The issue is for outreachy applicants and is being tracked here: rustfoundation/interop-initiative#53

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Apr 10, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 10, 2026

r? @dswij

rustbot has assigned @dswij.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: 7 candidates
  • 7 candidates expanded to 7 candidates
  • Random selection from Jarcho, dswij, llogiq, samueltardieu

@iAmChidiebereh
Copy link
Copy Markdown
Author

When I tried out rust_lexer::is_whitespace.

stephen@debian:~/outreachy/rust-clippy$ TESTNAME=collapsible_else_if.rs cargo uitest
Screenshot From 2026-04-10 09-06-28
stephen@debian:~/outreachy/rust-clippy$ TESTNAME=collapsible_else_if.rs cargo uibless

Snippet from generated collapsible_else_if.stderr file:

Screenshot From 2026-04-10 09-08-05

 
 

We can see we have an undesirable result collapse suggestion using elseif instead of else if

Copy link
Copy Markdown
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make the PR description shorter?
Just 3-4 dot points saying what the tests and code does it great.
If you need help summarising it, please let me know.

Edit: you can just link to the Outreachy ticket, you don't need to explain that ticket in detail.

I also want to check you're looking at the right is_whitespace method?
https://doc.rust-lang.org/std/primitive.char.html#method.is_whitespace

The snippet function returns a str, so we are working with Unicode chars here:
https://github.com/rust-lang/rust/blob/ad4b9354009cb6bd5a9ff1b5f5a63a13ec98ebc9/src/tools/clippy/clippy_utils/src/source.rs#L535

View changes since this review

Comment thread tests/ui/collapsible_else_if.fixed Outdated
#[rustfmt::skip]
fn spacing_zero_width_ws() {
// This test shows that the Char::is_whitespace gives a more desirable result than rust_lexer::is_whitespace for require spaacing logic
// We test out 2 zero-wisth characters recognized as whitespaces by the lexer but not by Char::is_whitespace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see we have an undesirable result collapse suggestion using elseif instead of else if

It looks like you're testing whitespace characters that are included in Rust language whitespace (Unicode Pattern_White_Space or rust_lexer::is_whitespace). This won't find whitespace bugs, but it is still a good test to have.

Please add another test with a whitespace character that is Unicode White_Space, but not Pattern_White_Space / rust_lexer::is_whitespace. Here are those lists:

For example, some characters to test are:

  • 00A0 ; White_Space # Zs NO-BREAK SPACE
  • 202F ; White_Space # Zs NARROW NO-BREAK SPACE

Copy link
Copy Markdown
Author

@iAmChidiebereh iAmChidiebereh Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're testing whitespace characters that are included in Rust language whitespace (Unicode Pattern_White_Space or rust_lexer::is_whitespace). This won't find whitespace bugs, but it is still a good test to have

I found out that the lexer panics on non-Pattern_White_Space characters, meaning that adding a test sample involving if else blocks that will generate snippets ending with whitespace characters in White_Space, but not in Pattern_White_Space is not possible.. We can only test within the Pattern_White_Space set. Given this, rustc_lexer::is_whitespace seemed logical to use. However, it counts the zero-width characters \u{200E} and \u{200F} as whitespace (and rightfully so), which causes Clippy to output what visibly looks like elseif as suggestion to users. While syntactically valid to the lexer, this invisible spacing is confusing to users. On the other hand, char::is_whitespace doesn't classify those two as whitespace, forcing the insertion of a visible space (else if) for readability. Therefore, this test only acts as a safeguard to prevent future developers from mistakenly "fixing" the check by switching to the lexer's method, which would degrade the visual output...

I could still use rustc_lexer::is_whitespace but I will also have to add a check that will let us treat \u{200E} and \u{200F} as non-whitespace characters for this particular problem. That will give us the same desirable behaviour. This is actually more intuitive as anyone reading the code will know we just treating them differently but they are still whitespaces.

Please add another test with a whitespace character that is Unicode White_Space, but not Pattern_White_Space / rust_lexer::is_whitespace. Here are those lists:

image

The image here was the 'unknown start of token' error when I tried testing fo a snippet that ends with 202F. It is the same for others.

Though it seems like I am missing out out on some detail and also, my assumption that we wouldn't want to print a suggestion that appears to look like 'elseif' even when there is actually a whitespace in between might very well be wrong.
I think you are trying to draw my attention to something that I am currently not figuring out. Please, I would appreciate a bit more of a pointer. Thank you.

Copy link
Copy Markdown
Contributor

@teor2345 teor2345 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that the lexer panics on non-Pattern_White_Space characters,

If adding a non-Pattern_White_Space character to a Rust source file causes a panic, then this exactly the kind of bug in clippy (or rustc) that we're trying to find.

meaning that adding a test sample involving if else blocks that will generate snippets ending with whitespace characters in White_Space, but not in Pattern_White_Space is not possible.

You can add a test and mark it with should_panic in one commit, then fix the bug in clippy/rustc, and remove should_panic from the test:
https://doc.rust-lang.org/reference/attributes/testing.html#the-should_panic-attribute

If it is a UI test, you can use //@ should-ice instead:
https://rustc-dev-guide.rust-lang.org/tests/directives.html#controlling-outcome-expectations

If using those directives isn't possible, it's ok to just make one final commit with the test.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.. Thank you
It is a ui test, so i tried out the should-ice directive and got "should-ice is not a command known to ui_test".
In trying to research on what could be done, I also realized @should-ice is not a supported directive in ui_test: https://docs.rs/ui_test/0.30.4/ui_test/#supported-comment-annotations

If using those directives isn't possible, it's ok to just make one final commit with the test.

Alright...I will just push the panicking test as it is. Thank you!

@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 10, 2026

Can you please make the PR description shorter? Just 3-4 dot points saying what the tests and code does it great. If you need help summarising it, please let me know.

Edit: you can just link to the Outreachy ticket, you don't need to explain that ticket in detail.

I have shortened it. Please help confirm it is okay.

I also want to check you're looking at the right is_whitespace method? https://doc.rust-lang.org/std/primitive.char.html#method.is_whitespace

Yes, this was what I was looking at

The snippet function returns a str, so we are working with Unicode chars here: https://github.com/rust-lang/rust/blob/ad4b9354009cb6bd5a9ff1b5f5a63a13ec98ebc9/src/tools/clippy/clippy_utils/src/source.rs#L535

Yes, I get this part... but my investigation showed that the lexer fails to parse the file before the lint even has a chance to run... so we don't even get to the part where we can extract the snippet to test on (right?).

@iAmChidiebereh iAmChidiebereh requested a review from teor2345 April 10, 2026 23:38
@iAmChidiebereh iAmChidiebereh changed the title Add test to confirm intentional use of Char::is_whitespace Document and test intentional use of char::is_whitespace in collapsible_if Apr 12, 2026
@iAmChidiebereh iAmChidiebereh changed the title Document and test intentional use of char::is_whitespace in collapsible_if Document and test use of char::is_whitespace in collapsible_if Apr 13, 2026
@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 14, 2026

Hi @teor2345
I have added the test and It currently panicks
Rustfmt wouldn't let my commit go through (i added a hook that auto-runs rustfmt on commit) and even #![rustfmt::skip] at the top of my file didn't work... so I had to write this test in a new file and then, instruct rustfmt to ignore the file by adding it to the ignore array in rustfmt.toml. That worked.

Comment thread tests/ui/collapsible_else_if_snippet_ends_with_npws.rs Outdated
@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 14, 2026

Hi @teor2345
Even though the ui test runs now, the one of the tests triggered to run in the CI is failing due to our generated .stderr file not meeting established convention for lint messages... Please could you look into this: https://github.com/rust-lang/rust-clippy/actions/runs/24382289363/job/71208280778?pr=16840
I think this is because a .stderr file now contains non-lint message. The clippy team probably ensured all lint messages are lowercase... Our error (not lint) message comes from the lexer and as such, do not conform to the convention. They contains uppercase letter.

@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 14, 2026

Also, I have been thinking... We are currently not testing any whitespace specific behaviour. We are just suppressing panic.
And I am trying to wrap my head around how this will play out because I currently don't see a way we can fix the main code (probably by switching to rustc_lexer::is_whitespace), take lines like //~ ERROR: unknown start of token: \u{202f} off and bring back //~ collapsible_else_if to finally have things running without suppressing the error. I think the error is a constant. Can you help clear my confusion?
I will be looking forward to your response. Thanks!

Copy link
Copy Markdown
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the tests … is failing due to our generated .stderr file not meeting established convention for lint messages: rust-lang/rust-clippy/actions/runs/24382289363/job/71208280778?pr=16840
this is because a .stderr file now contains [lexer] message. The clippy team probably ensured all lint messages are lowercase

This is a question for a clippy maintainer. They might decide to exempt the test, move it to another test category, or silence the whitespace errors.

But the test is important, because it shows that clippy doesn't crash, or break valid code.

(i've tried to just quote the essential parts of your question, to make it easier to read.)

We are currently not testing any whitespace specific behaviour. We are just suppressing panic. And I am trying to wrap my head around how this will play out because I currently don't see a way we can fix the main code (probably by switching to rustc_lexer::is_whitespace), take lines like //~ ERROR: unknown start of token: \u{202f} off and bring back //~ collapsible_else_if to finally have things running without suppressing the error. I think the error is a constant.

I don't see any clippy panics here. I see errors, but those errors are a correct response to invalid Rust source code.

Let's take a step back, and think about the purpose of the lint, this test, and any code changes:

  • the lint finds else { if that can be collapsed to else if, and suggests a fix if possible
  • for whitespace accepted by the Rust lexer, if the suggestion/fix changes that whitespace to other whitespace, the meaning of the program doesn't change (rustfmt also makes similar changes)
  • for whitespace not accepted by the Rust lexer, the Rust code has an error that will show up in the compiler. So it's ok for clippy to show that error, fix or not fix it, or ignore it. Clippy just can't panic or crash.

One test shows that clippy fixes code accepted by the lexer, and doesn't break the code when there's unusual white space. Another test shows that clippy correctly prints lexer errors when the code isn't valid Rust code.

So the only improvement we could make here is fixing non-lexer whitespace in the suggestion. But that is optional, and might not be possible, because the lint comes after the lexer.

View changes since this review

Comment thread tests/ui/collapsible_else_if_snippet_ends_with_npws.rs Outdated
Comment thread tests/ui/collapsible_else_if.rs Outdated
@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 15, 2026

This is a question for a clippy maintainer. They might decide to exempt the test, move it to another test category, or silence the whitespace errors.

But the test is important, because it shows that clippy doesn't crash, or break valid code.

Oh... Okay.. I will just wait for a maintainer to come around when they are free so that they can address this. Is it okay to tag someone? I'm cautious of appearing like I am pressing them for response despite their schedule... but I honestly wish I could tag them though.

I don't see any clippy panics here. I see errors, but those errors are a correct response to invalid Rust source code.

Oh... I keep mixing things up... Pardon my use of the word 'panic'... They are errors as you mentioned. I meant 'suppressing test failure due to errors'.

Let's take a step back, and think about the purpose of the lint, this test, and any code changes:

  • the lint finds else { if that can be collapsed to else if, and suggests a fix if possible
  • for whitespace accepted by the Rust lexer, if the suggestion/fix changes that whitespace to other whitespace, the meaning of the program doesn't change (rustfmt also makes similar changes)
  • for whitespace not accepted by the Rust lexer, the Rust code has an error that will show up in the compiler. So it's ok for clippy to show that error, fix or not fix it, or ignore it. Clippy just can't panic or crash.

One test shows that clippy fixes code accepted by the lexer, and doesn't break the code when there's unusual white space. Another test shows that clippy correctly prints lexer errors when the code isn't valid Rust code.

So the only improvement we could make here is fixing non-lexer whitespace in the suggestion. But that is optional, and might not be possible, because the lint comes after the lexer.

Thank you so much! I fully get it now. Thanks for the explanation...

But that is optional, and might not be possible, because the lint comes after the lexer.

This cleared things up honestly because I was confused on how we were going to fix this when the lint actually comes after the lexer. I have been researching on how I could go about fixing this, but then, like you said, it might not be possible. I think I found an issue in the course of investigating this problem and that was like few days ago. Will tag you to it
So yeah... we keep the tests as they act as documentation of the behaviour for snippets ending in the 2 category of whitespaces (recognized by the lexer and not recognized by the lexer).
Thank you once again!

@dswij
Copy link
Copy Markdown
Member

dswij commented Apr 15, 2026

the test collapsible_else_if_snippet_ends_with_npws is not necessary. Clippy will not run if it is not a valid rust. Let's not include those tests in this PR

@dswij
Copy link
Copy Markdown
Member

dswij commented Apr 15, 2026

My investigation revealed that the current char::is_whitespace check is actually intentional.

Can you point out where it is agreed/explicitly mentioned that it is intentional?

@iAmChidiebereh
Copy link
Copy Markdown
Author

the test collapsible_else_if_snippet_ends_with_npws is not necessary. Clippy will not run if it is not a valid rust. Let's not include those tests in this PR

Alright thank you!
I will take it out now.

@iAmChidiebereh
Copy link
Copy Markdown
Author

Can you point out where it is agreed/explicitly mentioned that it is intentional?

Looking back now, I think my choice of words didn't really tell what I meant. I am sorry about that. There was no place where it was explicitly mentioned that it was intentional. What I meant to say was "My investigation revealed that the current char::is_whitespace check 'might' actually be intentional".

The statement came from my investigation after testing out all lexer-recognized whitespaces... Using char::is_whitespace for the whitespace check, every whitespace recognized by the lexer is also classified as whitspace, except for 2: \u{200E} and \u{200F}. Co-incidentally, these 2 characters turns out to be zero-width. Using char::is_whitespace, theeby, ensures a space is added betweeen else and if when they are separated by any of those 2 characters... so we get to see the desirable "else if" suggestion.... If we had used the rustc_lexer::is_whitespace function, those 2 will be classified as whitespaces (and rightfully so), therefore, no space will be added in between, making us see suggestions like "elseif" (no space in between on print, even though there is an invisible whitespace). I don't think that is desirable. So it felt deliberate and intentinal from the person that implemented this. That's just an assumption from my end.

@teor2345
Copy link
Copy Markdown
Contributor

I will just wait for a maintainer to come around when they are free so that they can address this. Is it okay to tag someone? I'm cautious of appearing like I am pressing them for response despite their schedule... but I honestly wish I could tag them though.

If you are feeling time pressure because final Outreachy applications are due in a few hours' time, you don't need to worry. You can put links to incomplete PRs in your final application.

What happens to your PR is up to the maintainers of the tool you're modifying or testing. Sometimes PRs (or parts of PRs) don't get accepted for good reasons, and that's ok. Sometimes there are delays because reviewers are busy, and that's also ok.

@iAmChidiebereh
Copy link
Copy Markdown
Author

If you are feeling time pressure because final Outreachy applications are due in a few hours' time, you don't need to worry. You can put links to incomplete PRs in your final application.

What happens to your PR is up to the maintainers of the tool you're modifying or testing. Sometimes PRs (or parts of PRs) don't get accepted for good reasons, and that's ok. Sometimes there are delays because reviewers are busy, and that's also ok.

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants