feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195) by jmoraleda · Pull Request #1303 · HubSpot/jinjava

jmoraleda · 2026-04-01T00:12:22Z

Here's the revised PR description:

Title: feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)

Description:

Closes #195.

Python's Jinja2 allows full customization of the six delimiter strings via its Environment constructor (block_start_string, block_end_string, variable_start_string, variable_end_string, comment_start_string, comment_end_string), plus line_statement_prefix and line_comment_prefix. Jinjava had no equivalent, making it impossible to use Jinja-style templating in contexts where {{, {%, or {# appear as literal content (e.g. LaTeX documents, some JSON schemas, or Kubernetes YAML with Helm-style markers).

What this PR adds:

A new StringTokenScannerSymbols class with a builder API that allows all six delimiter strings to be configured independently, with no constraint on length or shared prefix characters:

JinjavaConfig config = JinjavaConfig.newBuilder()
    .withTokenScannerSymbols(StringTokenScannerSymbols.builder()
        .withVariableStartString("\\VAR{")
        .withVariableEndString("}")
        .withBlockStartString("\\BLOCK{")
        .withBlockEndString("}")
        .withCommentStartString("\\#{")
        .withCommentEndString("}")
        .withLineStatementPrefix("%%")
        .withLineCommentPrefix("%#")
        .build())
    .build();

Changes:

StringTokenScannerSymbols (new) — builder-configured TokenScannerSymbols implementation. Uses Unicode Private Use Area sentinel characters as internal token-kind discriminators so Token.newToken() dispatches correctly without changes to Token.
TokenScanner — adds a string-matching scan path (getNextTokenStringBased()) activated when symbols.isStringBased() is true. The original char-based path is completely unchanged. Also supports lineStatementPrefix and lineCommentPrefix, matching Python Jinja2 semantics including indented prefixes.
TokenScannerSymbols — adds isStringBased() (default false), six delimiter-length accessors (getTagStartLength() etc.), and two optional line-prefix accessors (getLineStatementPrefix(), getLineCommentPrefix()). All default implementations preserve existing behaviour.
TagToken, ExpressionToken, NoteToken — replaced hardcoded delimiter offsets with calls to the new length accessors on symbols. This is a correctness fix that affects all TokenScannerSymbols implementations, not just StringTokenScannerSymbols: ExpressionToken.parse() was calling WhitespaceUtils.unwrap(image, "{{", "}}") with literal strings regardless of the configured symbols, meaning any custom char-based subclass (like the one in CustomTokenScannerSymbolsTest) would silently fail to strip its expression delimiters. The fix uses symbols.getExpressionStart() and symbols.getExpressionEnd() instead.

Backward compatibility:

The char-based scan path and all existing TokenScannerSymbols subclasses are completely unaffected. The new length accessors on TokenScannerSymbols default to getTheCorrespondingString().length(), which for DefaultTokenScannerSymbols always returns 2. The full test suite passes without modification.

jmoraleda added 2 commits March 31, 2026 16:50

Support arbitrary multi-character delimiter strings

677aa06

Support single line logic for blocks and comments using a prefix

6127243

jmoraleda force-pushed the master branch from d6bb52b to 6127243 Compare April 1, 2026 01:57

jmoraleda mentioned this pull request Apr 1, 2026

Scanner treats backslash as escape character outside quoted strings, diverging from Jinja2 behaviour #1304

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)#1303

feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)#1303
jmoraleda wants to merge 2 commits intoHubSpot:masterfrom
jmoraleda:master

jmoraleda commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jmoraleda commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant