When an invalid regex is provided, boost::regex throws an exception with an error message containing the regex pattern. When using a non-utf8 format the exception's std::string error message can end up containing invalid utf8.
The problem is on this line:
|
message += std::string(m_base + start_pos, m_base + position); |
If you use say
boost::u32regex, then
m_base is a
char32_t *. Constructing a
std::string from two
char32_t * compiles but truncates characters, resulting in invalid utf8.
I'm not sure what the solution here is. In the mean time I've added a function to the character traits for converting a std::basic_string<charT> to std::string.
When an invalid regex is provided,
boost::regexthrows an exception with an error message containing the regex pattern. When using a non-utf8 format the exception'sstd::stringerror message can end up containing invalid utf8.The problem is on this line:
regex/include/boost/regex/v5/basic_regex_parser.hpp
Line 236 in ed6ebbd
boost::u32regex, thenm_baseis achar32_t *. Constructing astd::stringfrom twochar32_t *compiles but truncates characters, resulting in invalid utf8.I'm not sure what the solution here is. In the mean time I've added a function to the character traits for converting a
std::basic_string<charT>tostd::string.