Skip to content

Normalizing unencoded utf8 #39

@markov2

Description

@markov2

For instance on [https://www.meadjohnson.com.cn/](some Chinese page) I find horrible html like

<ul class="search-hot"><li><a data-id="dhl_yqhl" href="/tag/孕期护理">孕期护理</a></li>
<li><a data-id="dhl_yqzz" href="/tag/孕期症状">孕期症状</a></li>
<li><a data-id="dhl_yqyy" href="/tag/孕期营养">孕期营养</a></li>
<li><a data-id="dhl_dnfy" href="/tag/大脑发育">大脑发育</a></li>
<li><a data-id="dhl_cj" href="/tag/产检">产检</a></li>
<li><a data-id="dhl_fstj" href="/tag/辅食添加">辅食添加</a></li>
</ul>

Module URI normalizes them into hex encodings. Should they? (open question)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions