Hi, we are the security researchers from SunLab focusing on Rust. We discovered the parsing functionality that can lead to UB when processing Unicode numeric characters.
|
if c.is_numeric() { |
|
if index > BUF_LEN { |
|
return Err(Error::parse_number(&s)); |
|
} |
|
buf[index] = c as u8; |
|
index += 1; |
|
} |
|
} |
|
|
|
if index == 0 { |
|
return Err(Error::parse_number(&s)); |
|
} |
|
|
|
let s2 = unsafe { str::from_utf8_unchecked(&buf[..index]) }; |
The vulnerability exists in parsing.rs we included above, where the code incorrectly handles non-ASCII numeric characters, creating invalid UTF-8 strings through unsafe operations str::from_utf8_unchecked.
The code at line 94 uses c.is_numeric() which accepts all Unicode numeric characters, not just ASCII digits (0-9). Next at line 98, unicode characters are truncated to u8 with c as u8, discarding the high bytes. Then in line 107, the buffer containing invalid UTF-8 bytes is used to construct a &str via from_utf8_unchecked(), which assumes valid UTF-8 without verification. This violates Rust's safety and constitutes UB.
The bytes passed in must be valid UTF-8.
Proof of Concept on Invalid UTF-8 Generation
use num_format::Locale;
use num_format::parsing::ParseFormatted;
fn main() {
let test_cases = vec![
("𝟘", "U+1D7D8", "MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO"),
("①", "U+2460", "CIRCLED DIGIT ONE"),
("½", "U+00BD", "VULGAR FRACTION ONE HALF"),
];
for (input, unicode, description) in test_cases {
println!("Testing: {} ({}, {})", input, unicode, description);
let c = input.chars().next().unwrap();
let truncated = c as u8;
println!(" Codepoint: U+{:04X}", c as u32);
println!(" Truncated to: 0x{:02X}", truncated);
match std::str::from_utf8(&[truncated]) {
Ok(_) => println!(" Valid UTF-8"),
Err(_) => println!(" INVALID UTF-8 - Will cause UB!"),
}
match input.parse_formatted::<_, u32>(&Locale::en) {
Ok(n) => println!(" Parsed: {}", n),
Err(e) => println!(" Error: {}", e),
}
println!();
}
}
Output:
Testing: 𝟘 (U+1D7D8, MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO)
Codepoint: U+1D7D8
Truncated to: 0xD8
INVALID UTF-8 - Will cause UB!
Error: Failed to parse 𝟘 into a valid locale.
To be more sound, we can limit the inputs to be ASCII number.
Thanks for reading. Let me know if you have any question about this report.
Hi, we are the security researchers from SunLab focusing on Rust. We discovered the parsing functionality that can lead to UB when processing Unicode numeric characters.
num-format/num-format/src/parsing.rs
Lines 94 to 107 in c217371
The vulnerability exists in
parsing.rswe included above, where the code incorrectly handles non-ASCII numeric characters, creating invalid UTF-8 strings through unsafe operationsstr::from_utf8_unchecked.The code at line 94 uses
c.is_numeric()which accepts all Unicode numeric characters, not just ASCII digits (0-9). Next at line 98, unicode characters are truncated tou8withc as u8, discarding the high bytes. Then in line 107, the buffer containing invalid UTF-8 bytes is used to construct a&strviafrom_utf8_unchecked(), which assumes valid UTF-8 without verification. This violates Rust's safety and constitutes UB.Proof of Concept on Invalid UTF-8 Generation
Output:
To be more sound, we can limit the inputs to be ASCII number.
Thanks for reading. Let me know if you have any question about this report.