Skip to content

model/labels: Add case-insensitive contains StringMatcher#1

Open
chencs wants to merge 2 commits intocase-insensitive-prefix-matchingfrom
case-insensitive-contains-matcher
Open

model/labels: Add case-insensitive contains StringMatcher#1
chencs wants to merge 2 commits intocase-insensitive-prefix-matchingfrom
case-insensitive-contains-matcher

Conversation

@chencs
Copy link
Copy Markdown
Owner

@chencs chencs commented Apr 18, 2026

This change introduces a new containsCaseInsensitiveStringMatcher which is similar to the containsStringMatcher but matches case-insensitive substrings. For now, this is built off of prometheus#18540 for convenience.

There are a couple benchmark cases that are notably slower:

  • "(?i:foo)": parsed as a concat with only one subexpression (the literal)
  • "(?i:(foo|bar))": parsed as an alternate of two literals
  • "(?i:(foo1|foo2|bar))": parsed to an alternate of FOO[12] or bar
  • "(?i:(AAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBB|cccccccccccccccccccccccC|ſſſſſſſſſſſſſſſſſſſſſſſſS|SSSSSSSSSSSSSSSSSSSSSSSSſ))": case-insensitivity causes the ſs and the Ss to be simplified away, but the S alternate is now concatenated with an empty non-capturing group on the end.

So in 3/4 of these cases, I think we incur some extra overhead to check case-insensitive concats, and then end up not being able to use the new matcher anyways. I'm not sure what's happening with the second case ((?i:(foo|bar))), but I'm also not sure these are too much to worry about given that they're already pretty fast.

On the flip side, note the performance improvement for these examples, all of which were up in the microseconds:

  • (?i).*foo.*
  • "(?i).*/label/.*|.*/labels.*|.*/series.*"
  • the case-insensitive long alternation beginning with .*zQPbMkNO.*
BenchmarkFastRegexMatcher results
% benchstat original.txt case_insensitive_prefix_and_contains.txt                  
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Apple M2 Pro
                                                        │  original.txt  │ case_insensitive_prefix_and_contains.txt │
                                                        │     sec/op     │      sec/op       vs base                │
FastRegexMatcher/#00-10                                     54.71n ±  0%       55.80n ±  2%   +2.01% (p=0.000 n=10)
FastRegexMatcher/foo-10                                     57.67n ±  0%       58.74n ±  8%   +1.86% (p=0.001 n=10)
FastRegexMatcher/^foo-10                                    66.55n ±  0%       66.94n ±  0%   +0.58% (p=0.000 n=10)
FastRegexMatcher/(foo|bar)-10                               81.71n ±  1%       82.29n ±  0%   +0.71% (p=0.001 n=10)
FastRegexMatcher/foo.*-10                                   95.87n ±  1%       96.46n ±  0%   +0.62% (p=0.012 n=10)
FastRegexMatcher/.*foo-10                                   112.0n ±  0%       111.5n ±  1%   -0.49% (p=0.001 n=10)
FastRegexMatcher/^.*foo$-10                                 111.9n ±  1%       110.6n ±  0%   -1.12% (p=0.000 n=10)
FastRegexMatcher/^.+foo$-10                                 112.5n ±  1%       111.2n ±  2%   -1.11% (p=0.033 n=10)
FastRegexMatcher/.?-10                                      78.31n ±  0%       78.40n ±  5%        ~ (p=0.725 n=10)
FastRegexMatcher/.*-10                                      53.75n ±  0%       55.09n ± 12%   +2.48% (p=0.000 n=10)
FastRegexMatcher/.+-10                                      56.03n ±  0%       55.55n ±  1%        ~ (p=0.078 n=10)
FastRegexMatcher/foo.+-10                                   96.27n ±  0%       97.17n ±  1%   +0.94% (p=0.001 n=10)
FastRegexMatcher/.+foo-10                                   113.4n ±  1%       111.8n ±  0%   -1.41% (p=0.000 n=10)
FastRegexMatcher/foo_.+-10                                  87.00n ±  0%       87.38n ±  0%   +0.43% (p=0.010 n=10)
FastRegexMatcher/foo_.*-10                                  87.03n ±  0%       87.64n ±  0%   +0.70% (p=0.000 n=10)
FastRegexMatcher/.*foo.*-10                                 172.8n ±  0%       164.9n ±  0%   -4.57% (p=0.000 n=10)
FastRegexMatcher/.+foo.+-10                                 205.2n ±  3%       194.4n ±  0%   -5.22% (p=0.000 n=10)
FastRegexMatcher/.*foo.*|-10                                244.3n ±  1%       243.0n ±  2%        ~ (p=0.516 n=10)
FastRegexMatcher/.*foo.*|bar.*-10                           281.9n ±  0%       280.0n ±  0%   -0.67% (p=0.001 n=10)
FastRegexMatcher/foo.*|.*bar.*-10                           280.2n ±  0%       279.7n ±  0%        ~ (p=0.196 n=10)
FastRegexMatcher/.*foo.*|.*bar.*-10                         297.3n ±  0%       296.5n ±  1%        ~ (p=0.100 n=10)
FastRegexMatcher/.*foo.*bar.*|.*hello.*-10                  13.77µ ±  0%       13.83µ ±  1%   +0.49% (p=0.000 n=10)
FastRegexMatcher/.*foo.*|.*bar.*|.*hello.*-10               486.6n ±  8%       483.0n ±  0%   -0.74% (p=0.001 n=10)
FastRegexMatcher/.+.*foo.*|.*bar.*-10                       16.06µ ±  1%       16.13µ ±  1%        ~ (p=0.089 n=10)
FastRegexMatcher/(?s:.*)-10                                 53.60n ±  1%       54.40n ±  0%   +1.49% (p=0.000 n=10)
FastRegexMatcher/(?s:.+)-10                                 56.15n ±  1%       55.29n ±  0%   -1.51% (p=0.000 n=10)
FastRegexMatcher/(?s:^.*foo$)-10                            113.5n ±  1%       111.0n ±  0%   -2.20% (p=0.000 n=10)
FastRegexMatcher/(?i:foo)-10                                81.41n ±  0%       90.92n ±  1%  +11.69% (p=0.000 n=10)
FastRegexMatcher/(?i:(foo|bar))-10                          169.6n ±  1%       193.2n ±  2%  +13.88% (p=0.000 n=10)
FastRegexMatcher/(?i:(foo1|foo2|bar))-10                    299.0n ±  1%       335.6n ±  1%  +12.22% (p=0.000 n=10)
FastRegexMatcher/^(?i:foo|oo)|(bar)$-10                     743.0n ± 14%       727.5n ±  2%        ~ (p=0.127 n=10)
FastRegexMatcher/(?i:(foo1|foo2|aaa|bbb|ccc|ddd|e-10        632.8n ±  4%       605.2n ±  2%   -4.35% (p=0.001 n=10)
FastRegexMatcher/((.*)(bar|b|buzz)(.+)|foo)$-10             469.5n ±  9%       459.1n ±  2%   -2.20% (p=0.003 n=10)
FastRegexMatcher/^$-10                                      54.77n ±  2%       55.32n ±  5%   +0.99% (p=0.023 n=10)
FastRegexMatcher/(prometheus|api_prom)_api_v1_.+-10         173.3n ±  0%       172.2n ±  2%        ~ (p=0.362 n=10)
FastRegexMatcher/10\.0\.(1|2)\.+-10                         87.20n ±  1%       87.57n ±  1%        ~ (p=0.055 n=10)
FastRegexMatcher/10\.0\.(1|2).+-10                          87.53n ±  3%       88.09n ±  2%   +0.64% (p=0.029 n=10)
FastRegexMatcher/((fo(bar))|.+foo)-10                       211.9n ±  0%       207.6n ±  2%   -2.05% (p=0.001 n=10)
FastRegexMatcher/zQPbMkNO|NNSPdvMi|iWuuSoAl|qbvKM-10        186.8n ± 15%       186.5n ± 10%        ~ (p=0.971 n=10)
FastRegexMatcher/jyyfj00j0061|jyyfj00j0062|jyyfj9-10        188.8n ±  9%       183.2n ±  5%        ~ (p=0.382 n=10)
FastRegexMatcher/.*zQPbMkNO.*|.*NNSPdvMi.*|.*iWuu-10        11.86µ ±  1%       11.82µ ±  1%        ~ (p=0.470 n=10)
FastRegexMatcher/(?i:(zQPbMkNO|NNSPdvMi|iWuuSoAl|-10        621.4n ±  4%       625.1n ±  7%        ~ (p=0.971 n=10)
FastRegexMatcher/(?i:(AAAAAAAAAAAAAAAAAAAAAAAA|BB-10        374.6n ±  0%       440.4n ±  7%  +17.58% (p=0.000 n=10)
FastRegexMatcher/(?i:(zQPbMkNO.*|NNSPdvMi.*|iWuuS-10        269.5n ±  0%       268.5n ±  3%        ~ (p=0.670 n=10)
FastRegexMatcher/(?i:(zQPbMkNO.*|NNSPdvMi.*|iWuuS#01-10     445.2n ±  1%       443.5n ±  1%        ~ (p=0.066 n=10)
FastRegexMatcher/(?i:(.*zQPbMkNO|.*NNSPdvMi|.*iWu-10        7.513µ ±  1%       7.536µ ±  0%        ~ (p=0.093 n=10)
FastRegexMatcher/fo.?-10                                    97.56n ±  0%       97.67n ±  2%        ~ (p=0.896 n=10)
FastRegexMatcher/foo.?-10                                   97.60n ±  0%       97.98n ±  1%        ~ (p=0.148 n=10)
FastRegexMatcher/f.?o-10                                    81.08n ±  1%       81.88n ±  8%        ~ (p=0.190 n=10)
FastRegexMatcher/.*foo.?-10                                 202.5n ±  0%       201.4n ±  5%        ~ (p=0.469 n=10)
FastRegexMatcher/.?foo.+-10                                 196.5n ±  0%       197.8n ±  4%        ~ (p=0.782 n=10)
FastRegexMatcher/foo.?|bar-10                               159.1n ±  4%       151.5n ±  3%   -4.75% (p=0.000 n=10)
FastRegexMatcher/ſſs-10                                     57.97n ±  0%       57.15n ±  0%   -1.41% (p=0.000 n=10)
FastRegexMatcher/.*-.*-.*-.*-.*-10                          200.3n ±  1%       183.0n ±  9%   -8.68% (p=0.004 n=10)
FastRegexMatcher/.+-.*-.*-.*-.+-10                          199.6n ±  1%       183.5n ±  1%   -8.07% (p=0.000 n=10)
FastRegexMatcher/-.*-.*-.*-.*-10                            98.13n ±  1%       96.47n ±  0%   -1.70% (p=0.000 n=10)
FastRegexMatcher/.*-.*-.*-.*--10                            115.1n ±  0%       114.0n ±  1%   -0.96% (p=0.007 n=10)
FastRegexMatcher/(.+)-(.+)-(.+)-(.+)-(.+)-10                199.5n ±  2%       187.8n ±  6%   -5.89% (p=0.001 n=10)
FastRegexMatcher/((.*))(?i:f)((.*))o((.*))o((.*))-10        4.335µ ±  0%       4.402µ ±  2%   +1.55% (p=0.000 n=10)
FastRegexMatcher/((.*))f((.*))(?i:o)((.*))o((.*))-10        3.548µ ±  0%       3.575µ ±  7%   +0.76% (p=0.007 n=10)
FastRegexMatcher/(.*0.*)-10                                 131.0n ±  1%       119.2n ± 10%   -9.00% (p=0.004 n=10)
FastRegexMatcher/(?i).*foo.*-10                             7.654µ ±  0%       1.031µ ±  1%  -86.53% (p=0.000 n=10)
FastRegexMatcher/(?i)report.scheduled.job_runsche-10       168.85n ±  0%       78.06n ±  1%  -53.77% (p=0.000 n=10)
FastRegexMatcher/report.scheduled.job_runschedule-10        87.39n ±  0%       88.45n ±  4%   +1.22% (p=0.020 n=10)
FastRegexMatcher/(?i).*zQPbMkNO.*|.*NNSPdvMi.*|.*-10      1153.52µ ±  0%       96.66µ ±  1%  -91.62% (p=0.000 n=10)
FastRegexMatcher/(?i).*/label/.*|.*/labels.*|.*/s-10       19.235µ ±  1%       2.857µ ±  4%  -85.15% (p=0.000 n=10)
FastRegexMatcher/.*/label/.*|.*/labels.*|.*/serie-10        386.8n ±  5%       370.4n ±  1%   -4.24% (p=0.000 n=10)
geomean                                                     273.9n             245.7n        -10.28%

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

NONE

@chencs chencs force-pushed the case-insensitive-prefix-matching branch from 3a24433 to f0bdb9b Compare April 20, 2026 18:20
@chencs chencs force-pushed the case-insensitive-contains-matcher branch from e9a055f to 39c979e Compare April 20, 2026 18:21
@chencs chencs changed the title model/labels: Add case-sensitive contains StringMatcher model/labels: Add case-insensitive contains StringMatcher Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant