Support non-ASCII Unicode in grammar rule names by traviscross · Pull Request #2196 · rust-lang/reference

traviscross · 2026-03-04T00:25:55Z

The grammar currently supports only ASCII rule names. We want to support non-ASCII Unicode symbols such as ⊥ (bottom) since we plan to add that rule.

In this commit, we add is_name_start and is_name_continue predicates that centralize the decision of what can appear in a rule name. is_name_start accepts alphabetic characters, underscores, and non-ASCII characters; is_name_continue accepts alphanumeric characters, underscores, and non-ASCII characters.

We use is_name_start in the parse_expr1 condition that routes to parse_nonterminal. The previous condition (is_alphanumeric) was slightly misaligned with what parse_name actually accepts -- it included digits (which parse_name rejects) and excluded underscores (which parse_name accepts). Using is_name_start makes the dispatch condition match parse_name exactly.

The NAMES_RE regex in mdbook-spec encodes the same name-matching logic as a regex pattern, so let's add a comment tying it to the predicates.

cc @ehuss

The grammar currently supports only ASCII rule names. We want to support non-ASCII Unicode symbols such as `⊥` (bottom) since we plan to add that rule. In this commit, we add `is_name_start` and `is_name_continue` predicates that centralize the decision of what can appear in a rule name. `is_name_start` accepts alphabetic characters, underscores, and non-ASCII characters; `is_name_continue` accepts alphanumeric characters, underscores, and non-ASCII characters. We use `is_name_start` in the `parse_expr1` condition that routes to `parse_nonterminal`. The previous condition (`is_alphanumeric`) was slightly misaligned with what `parse_name` actually accepts -- it included digits (which `parse_name` rejects) and excluded underscores (which `parse_name` accepts). Using `is_name_start` makes the dispatch condition match `parse_name` exactly. The `NAMES_RE` regex in `mdbook-spec` encodes the same name-matching logic as a regex pattern, so let's add a comment tying it to the predicates.

tools/grammar/src/parser.rs

ehuss

Thanks!

View changes since this review

rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Mar 4, 2026

ehuss reviewed Mar 4, 2026

View reviewed changes

tools/grammar/src/parser.rs Show resolved Hide resolved

ehuss approved these changes Mar 4, 2026

View reviewed changes

ehuss added this pull request to the merge queue Mar 4, 2026

Merged via the queue into master with commit caa4205 Mar 4, 2026
6 checks passed

rustbot removed the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non-ASCII Unicode in grammar rule names#2196

Support non-ASCII Unicode in grammar rule names#2196
ehuss merged 1 commit intomasterfrom
TC/support-nonascii-in-grammar-rule-names

traviscross commented Mar 4, 2026

Uh oh!

Uh oh!

ehuss left a comment •

edited by rustbot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

traviscross commented Mar 4, 2026

Uh oh!

Uh oh!

ehuss left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ehuss left a comment •

edited by rustbot

Loading