MDEV-30124: Add format validation for JSON Schema#4750
MDEV-30124: Add format validation for JSON Schema#4750varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
Conversation
293b984 to
e5142f5
Compare
e5142f5 to
5a6ab32
Compare
Implement optional format validation for the JSON_SCHEMA_VALID() function, as specified by JSON Schema Draft 2020-12. The format keyword is treated as an annotation by default (validation disabled). A new session variable json_schema_format_validation enables actual format validation when set to ON. Supported formats: date-time, date, time, duration, email, idn-email, hostname, idn-hostname, ipv4, ipv6, uri, uri-reference, iri, iri-reference, uuid, json-pointer, relative-json-pointer, regex. Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
5a6ab32 to
602a323
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
In general I would strive to reuse as much of the existing infrastructure in the server as possible: charset handling, parsers for various data types, 3d party libraries etc.
The declarations of the formats are far from simple: some have quoting, some escaping etc.
I've jotted down some of the issues that I find at a first glance. It looks like this can benefit from some sort of standardized testing of the parser too. I'm sure test sets exist for most of these parsers.
| const char *val= (const char *) je->value; | ||
| int len= (int) je->value_len; | ||
|
|
||
| if (len == 9 && !memcmp(val, "date-time", 9)) |
There was a problem hiding this comment.
frankly, I'd use sizeof("date-time") instead of "9".
| return false; | ||
| } | ||
|
|
||
| static inline bool is_digit(char c) { return c >= '0' && c <= '9'; } |
There was a problem hiding this comment.
any specific reason you can't use the mysys versions of these?
| static bool validate_format_email(const char *val, int len) | ||
| { | ||
| int at_pos= -1; | ||
| for (int i= 0; i < len; i++) |
There was a problem hiding this comment.
any specific reason why you can't use strchr() here? The @ sign can be a valid local part sign if quoted!
| if (i == 0 || i == domain_len - 1) return false; | ||
| if (domain[i - 1] == '.') return false; | ||
| } | ||
| else if (!is_alpha(domain[i]) && !is_digit(domain[i]) && domain[i] != '-') |
There was a problem hiding this comment.
What about (e.g.) Cyrillic domains? You need a good email format parser here!
| if (len < 1 || !is_alpha(val[0])) return false; | ||
| int i= 1; | ||
| while (i < len && (is_alpha(val[i]) || is_digit(val[i]) || | ||
| val[i] == '+' || val[i] == '-' || val[i] == '.')) |
There was a problem hiding this comment.
this is not a complete parser as well.
Summary
Implements optional format validation for the
JSON_SCHEMA_VALID()function per JSON Schema Draft 2020-12.json_schema_format_validation(defaultOFF)OFF, theformatkeyword is treated as annotation only (existing behavior)ON, validates strings against 18 format types:date-time,date,time,duration,email,idn-email,hostname,idn-hostname,ipv4,ipv6,uri,uri-reference,iri,iri-reference,uuid,json-pointer,relative-json-pointer,regexTest plan
json_schema_format_validation=ON