Skip to content
/ server Public

MDEV-30124: Add format validation for JSON Schema#4750

Open
varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
varundeepsaini:MDEV-30124-json-schema-format-validation
Open

MDEV-30124: Add format validation for JSON Schema#4750
varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
varundeepsaini:MDEV-30124-json-schema-format-validation

Conversation

@varundeepsaini
Copy link
Contributor

Summary

Implements optional format validation for the JSON_SCHEMA_VALID() function per JSON Schema Draft 2020-12.

  • Adds a new session variable json_schema_format_validation (default OFF)
  • When OFF, the format keyword is treated as annotation only (existing behavior)
  • When ON, validates strings against 18 format types: date-time, date, time, duration, email, idn-email, hostname, idn-hostname, ipv4, ipv6, uri, uri-reference, iri, iri-reference, uuid, json-pointer, relative-json-pointer, regex
  • Unknown format values always pass validation

Test plan

  • Existing format annotation tests continue to pass unchanged
  • New tests cover all 18 formats with valid/invalid inputs when json_schema_format_validation=ON
  • Verified non-string types always pass regardless of format
  • Verified unknown format values are treated as annotation

@varundeepsaini varundeepsaini marked this pull request as draft March 7, 2026 05:22
@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch 3 times, most recently from 293b984 to e5142f5 Compare March 7, 2026 08:22
@varundeepsaini varundeepsaini marked this pull request as ready for review March 7, 2026 09:34
@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch from e5142f5 to 5a6ab32 Compare March 7, 2026 09:34
Implement optional format validation for the JSON_SCHEMA_VALID() function,
as specified by JSON Schema Draft 2020-12. The format keyword is treated as
an annotation by default (validation disabled). A new session variable
json_schema_format_validation enables actual format validation when set to ON.

Supported formats: date-time, date, time, duration, email, idn-email,
hostname, idn-hostname, ipv4, ipv6, uri, uri-reference, iri, iri-reference,
uuid, json-pointer, relative-json-pointer, regex.

Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
@varundeepsaini varundeepsaini force-pushed the MDEV-30124-json-schema-format-validation branch from 5a6ab32 to 602a323 Compare March 7, 2026 16:45
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 9, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

In general I would strive to reuse as much of the existing infrastructure in the server as possible: charset handling, parsers for various data types, 3d party libraries etc.

The declarations of the formats are far from simple: some have quoting, some escaping etc.

I've jotted down some of the issues that I find at a first glance. It looks like this can benefit from some sort of standardized testing of the parser too. I'm sure test sets exist for most of these parsers.

const char *val= (const char *) je->value;
int len= (int) je->value_len;

if (len == 9 && !memcmp(val, "date-time", 9))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frankly, I'd use sizeof("date-time") instead of "9".

return false;
}

static inline bool is_digit(char c) { return c >= '0' && c <= '9'; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason you can't use the mysys versions of these?

static bool validate_format_email(const char *val, int len)
{
int at_pos= -1;
for (int i= 0; i < len; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason why you can't use strchr() here? The @ sign can be a valid local part sign if quoted!

if (i == 0 || i == domain_len - 1) return false;
if (domain[i - 1] == '.') return false;
}
else if (!is_alpha(domain[i]) && !is_digit(domain[i]) && domain[i] != '-')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about (e.g.) Cyrillic domains? You need a good email format parser here!

if (len < 1 || !is_alpha(val[0])) return false;
int i= 1;
while (i < len && (is_alpha(val[i]) || is_digit(val[i]) ||
val[i] == '+' || val[i] == '-' || val[i] == '.'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a complete parser as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants