libcpp caused XID_Start codepoints detection error [bug]
- arabic indic digit 4 can be used as a start identifier
// test.rs
fn main() {
let ٤ = 42; // U+0664 ARABIC-INDIC DIGIT FOUR and it should not be a start identifier
println!("{}", ٤);
}
nerdtook@PC:~/Temporary/nccp$ gcc test.rs -frust-incomplete-and-experimental-compiler-do-not-use
test.rs:3:9: warning: unused name ‘٤’ [-Wunused-variable]
3 | let ٤ = 42; // U+0664 ARABIC-INDIC DIGIT FOUR
but gccrs accept it as a valid identifier.
- wrong code here: libcpp/character.cc
/* Returns flags representing the XID properties of the given codepoint. */
unsigned int
cpp_check_xid_property (cppchar_t c)
{
...
if (flags & CXX23)
return CPP_XID_START | CPP_XID_CONTINUE;
if (flags & NXX23)
return CPP_XID_CONTINUE;
return 0;
}
- switch those 2 if-statements to fix it, otherwise 2nd case never touched.
- if (flags & CXX23)
- return CPP_XID_START | CPP_XID_CONTINUE;
if (flags & NXX23)
return CPP_XID_CONTINUE;
+ if (flags & CXX23)
+ return CPP_XID_START | CPP_XID_CONTINUE;
- why this happening?
In libcpp/makeucnid.cc
static void
read_derivedcore (char *fname)
{
...
if (strncmp (l, "XID_Start ", 10) == 0)
{
for (; codepoint_start <= codepoint_end; codepoint_start++)
flags[codepoint_start]
= (flags[codepoint_start] | CXX23) & ~NXX23;
}
else if (strncmp (l, "XID_Continue ", 13) == 0)
{
for (; codepoint_start <= codepoint_end; codepoint_start++)
if ((flags[codepoint_start] & CXX23) == 0)
flags[codepoint_start] |= CXX23 | NXX23;
}
...
}
there is the only place set flags CXX23 and NXX23, which means:
for any coepoints if it have DerivedCoreProperty:
XID start -> 1 CXX23 0NXX23
XID continue -> 1 CXX23 1 NXX23
none of those above -> 0 CXX23 0 NXX23
it never happens as "0 CXX23 and 1 NXX23", so if test XID start/continue property by:
if (flags & CXX23)
return CPP_XID_START | CPP_XID_CONTINUE;
if (flags & NXX23)
return CPP_XID_CONTINUE;
an XID continue character will be mistakely treated as an CPP_XID_START character, therefore
let ٤ = 42; // U+0664 ARABIC-INDIC DIGIT FOUR and it should not be a start identifier
becomes an valid identifier.
static const struct ucnrange ucnranges[] = {
...
{ 0| 0| 0|C11| 0|CXX23|NXX23|CID|NFC|NKC| 0, 220, 0x065f },
{ C99|N99| 0|C11| 0|CXX23|NXX23|CID|NFC|NKC| 0, 0, 0x0669 }, // <--- U+0664 belongs to range [0x0660, 0x0669], CXX23|NXX23 all sets, NXX23 means it should not be an start identifer too.
...
}
I also trying to report this to gcc, but I'm waitting for requesting an account ...
- range
This effects only rust-lex.cc, other gcc part not depend on function "cpp_check_xid_property" but rather they tests NXX23 bits directly, after make sure it have CXX23 seted.
libcpp caused XID_Start codepoints detection error [bug]
but gccrs accept it as a valid identifier.
In libcpp/makeucnid.cc
there is the only place set flags CXX23 and NXX23, which means:
for any coepoints if it have DerivedCoreProperty:
XID start -> 1 CXX23 0NXX23
XID continue -> 1 CXX23 1 NXX23
none of those above -> 0 CXX23 0 NXX23
it never happens as "0 CXX23 and 1 NXX23", so if test XID start/continue property by:
an XID continue character will be mistakely treated as an CPP_XID_START character, therefore
becomes an valid identifier.
I also trying to report this to gcc, but I'm waitting for requesting an account ...
This effects only rust-lex.cc, other gcc part not depend on function "cpp_check_xid_property" but rather they tests NXX23 bits directly, after make sure it have CXX23 seted.