Skip to content

Support non-ASCII Unicode in grammar rule names#2196

Merged
ehuss merged 1 commit intomasterfrom
TC/support-nonascii-in-grammar-rule-names
Mar 4, 2026
Merged

Support non-ASCII Unicode in grammar rule names#2196
ehuss merged 1 commit intomasterfrom
TC/support-nonascii-in-grammar-rule-names

Conversation

@traviscross
Copy link
Contributor

The grammar currently supports only ASCII rule names. We want to support non-ASCII Unicode symbols such as (bottom) since we plan to add that rule.

In this commit, we add is_name_start and is_name_continue predicates that centralize the decision of what can appear in a rule name. is_name_start accepts alphabetic characters, underscores, and non-ASCII characters; is_name_continue accepts alphanumeric characters, underscores, and non-ASCII characters.

We use is_name_start in the parse_expr1 condition that routes to parse_nonterminal. The previous condition (is_alphanumeric) was slightly misaligned with what parse_name actually accepts -- it included digits (which parse_name rejects) and excluded underscores (which parse_name accepts). Using is_name_start makes the dispatch condition match parse_name exactly.

The NAMES_RE regex in mdbook-spec encodes the same name-matching logic as a regex pattern, so let's add a comment tying it to the predicates.

cc @ehuss

The grammar currently supports only ASCII rule names.  We want to
support non-ASCII Unicode symbols such as `⊥` (bottom) since we plan
to add that rule.

In this commit, we add `is_name_start` and `is_name_continue`
predicates that centralize the decision of what can appear in a rule
name.  `is_name_start` accepts alphabetic characters, underscores,
and non-ASCII characters; `is_name_continue` accepts alphanumeric
characters, underscores, and non-ASCII characters.

We use `is_name_start` in the `parse_expr1` condition that routes to
`parse_nonterminal`.  The previous condition (`is_alphanumeric`) was
slightly misaligned with what `parse_name` actually accepts -- it
included digits (which `parse_name` rejects) and excluded underscores
(which `parse_name` accepts).  Using `is_name_start` makes the
dispatch condition match `parse_name` exactly.

The `NAMES_RE` regex in `mdbook-spec` encodes the same name-matching
logic as a regex pattern, so let's add a comment tying it
to the predicates.
@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Mar 4, 2026
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehuss ehuss added this pull request to the merge queue Mar 4, 2026
Merged via the queue into master with commit caa4205 Mar 4, 2026
6 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants