Skip to content

Granular Control over Number Splitting/Merging #23

@arran4

Description

@arran4

Feature Enhancement Request for strings2:

Title: Granular Control over Number Splitting/Merging

Problem:
Currently, WithNumberSplitting(true) treats any transition to/from a digit as a hard word boundary. This forces outputs like 123test -> 123-test. Standard programming identifiers often treat numbers as suffixes (file1) or prefixes (2fa) without separation.

Proposed Solution:
Introduce a NumberMode configuration:

  1. SplitAlways (Current behavior): word123 -> word-123
  2. MergeRecursive: Treat digits as compatible with both preceding and succeeding lowercase letters.
    • 123test -> 123test
    • test123test -> test123test
  3. TreatAsLowercase: Simply treat digits as if they were lowercase letters for boundary detection.

Example Prompt:
"Please add a NumberMode option to strings2. We need a mode that mimics typical variable naming where digits do not force a new word. For example, ToKebab('User123ID') should optionally result in user123-id (if '123' merges with 'User') or user-123-id (current). Specifically, we need to support 123test -> 123test (no hyphen) to match legacy behavior."

arran4/go-subcommand#166 (comment)

To prevent the split in StartWithDigit123 -> start-with-digit-123, strings2 needs to allow numbers to attached to the preceding word.

Feature Request: Configurable Number Handling.

Details: We need a mode where digits are treated as a continuation of the previous character class (specifically lowercase letters) rather than a new word boundary.

Example: strings2.ToKebab("StartWithDigit123", strings2.WithNumberHandling(strings2.MergeWithPreceding)) should yield start-with-digit123.

arran4/go-subcommand#166 (comment)

To replace the manual SanitizeToIdentifier logic completely, strings2 would need a way to strictly filter the input characters.

Feature Request: Add a WithAllowedCharacters(predicate func(rune) bool) option.

Details: Currently, strings2 splits on delimiters but may preserve other symbols. We need an option to explicitly drop unwanted characters (like @, #, emojis) from the output.

Example Prompt: "Add a WithAllowedCharacters option to ToPascal (and others). If provided, any rune where predicate(r) is false should be discarded during tokenization. This allows users to generate strict identifiers by filtering out !unicode.IsLetter(r) && !unicode.IsDigit(r)."

arran4/go-subcommand#166 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions