Conversation
I'm keeping `c_parser.mly` and `c_lexer.mll` unchanged for now to avoid destroying `c_parser_error.messages` and to keep CN in a working state.
This does not duplicate the C grammar, instead when the CN lexer sees matching `<` ... `>` it attempts to parse a C type-name between the angles (using a new entry point to the C parser) and produces a single `LT_CTYPE_GT` token for the CN parser. This may fail because the opening `<` may in fact be a relational operator. In this case the CN lexer recovers by re-lexing from the top. This first attempt avoids changing the existing CN syntax. If we instead change it to have dedicated delimiters around C type-names, we won't need to do the re-lexing recovery. Most of the CN ci tests now work (the failures are caused by the current lack of pretty error messages in the CN parser), but this needs proper testing...
|
Thanks very much! AFAICS |
|
Regarding the question of parsing C-types within CN: maybe the lexer hack is less problematic than it seems: there's a small set of language constructs only that can take C-types ( |
dc-mak
left a comment
There was a problem hiding this comment.
Lgtm, though this ctype issue is gonna be annoying.
| let hexadecimal_constant = hexadecimal_prefix hexadecimal_digit+ | ||
|
|
||
| (* C23 binary constant, but omitting ' separators for now *) | ||
| (* TODO(Christopher/Dhruv): add support for ' separators to be in line with C23 ? *) |
| (* TODO(Christopher/Dhruv): do you care about following C11 closely for this | ||
| (e.g. with respect to the universal character stuff), or should we simplify | ||
| like we already do for uppercase names. *) |
There was a problem hiding this comment.
I vote simplify - we can always add it later if requested.
| | '<' ([^'='][^'<' '>']+ as str) '>' | ||
| { | ||
| try | ||
| LT_CTYPE_GT (C_parser.type_name_eof C_lexer.lexer (Lexing.from_string str)) | ||
| with | ||
| | _ -> | ||
| relexbuf_opt := Some (Lexing.from_string (str ^ ">")); | ||
| LT | ||
| } | ||
|
|
There was a problem hiding this comment.
This is an amazing hack. But I do worry it might be a bit too hacky. Can it be more conservative? Specifically, I wanted to have <- available as a token. We do need to discuss this properly though @cp526
There was a problem hiding this comment.
The idea of ...<...> was to have a C++-like syntax for type parameters, which would be good to keep if we can. What's the use of <-?
There was a problem hiding this comment.
I think types always start with a letter (or maybe _), so you could probably change it to only try for a type, if the < is followed by one of these. I think should allow you to have <- later if you wanted it.
Otoh, I don't think this would work if you ever wanted types that have instantiations in them, but we probably don't have those at the moment.
| "boolean", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) | ||
| "CN_bool", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) |
We're still using |
Yes, that's true. |
| "bool", BOOL; (* shared with C23 *) | ||
| "boolean", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) | ||
| "CN_bool", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) | ||
| "cn_function", FUNCTION; (* TODO(Christopher/Dhruv): is this variant still needed? *) |
This creates a separate lexer and parser for CN.
There are a few questions for @cp526 and @dc-mak marked with
TODO(Christopher/Dhruv)in the code.And there is a hack for dealing with the parsing of C type-names discussed in the commit message of c96a1ad.
This is still work-in-progress, the CN parser doesn't produce pretty messages so some CI and tutorial tests fail.