Skip to content

replace bleach with nh3 for HTML sanitization#14442

Open
valentijnscholten wants to merge 5 commits intoDefectDojo:devfrom
valentijnscholten:bleach-to-nh3
Open

replace bleach with nh3 for HTML sanitization#14442
valentijnscholten wants to merge 5 commits intoDefectDojo:devfrom
valentijnscholten:bleach-to-nh3

Conversation

@valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Mar 4, 2026

Bleach has been deprecated since Feb '23. Although it still works, a security product like Defect Dojo needs to switch to the goto replacement: nh3 which is also faster.

Summary

  • Replaces the deprecated (archived) bleach library with nh3, its actively maintained, Rust-backed successor
  • Drops bleach[css] from requirements.txt and adds nh3
  • Updates all call sites in dojo/utils.py, dojo/templatetags/display_tags.py, and dojo/templatetags/get_banner.py
  • Replaces bleach.ALLOWED_TAGS / bleach.ALLOWED_ATTRIBUTES with explicit constants (_NH3_ALLOWED_TAGS, _NH3_ALLOWED_ATTRIBUTES) shared between the two template tag modules

Note on style attribute: bleach supported CSS property-level filtering via CSSSanitizer (e.g. allowing only color and font-weight). nh3 has no equivalent — it cannot filter individual CSS properties, only entire attributes. To avoid allowing arbitrary CSS injection, the style attribute is no longer permitted on any element. In practice this means inline styling (e.g. colored text in the login banner, background-color on images in markdown) will be stripped rather than sanitized. This is the safer trade-off.

Note on disallowed tag handling: bleach escaped disallowed tags (e.g. <script>&lt;script&gt;), making them visible as literal text. nh3 strips them entirely, keeping only the text content. This is the correct behavior for a sanitizer.

bleach is deprecated and archived. nh3 is its Rust-backed successor,
actively maintained and significantly faster.
@github-actions github-actions bot added the ui label Mar 4, 2026
@valentijnscholten valentijnscholten added the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label Mar 4, 2026
@dryrunsecurity
Copy link

dryrunsecurity bot commented Mar 4, 2026

DryRun Security

🟡 Please give this pull request extra attention during review.

This pull request introduces mark_safe around nh3.clean() output in multiple template tags (banner and markdown/bleach filters), relying on small custom allowlists for tags and attributes so Django auto-escaping is bypassed; if nh3.clean is misconfigured or the allowlists don't prevent dangerous attribute values (e.g. javascript: URIs or event handlers), attacker- or admin-controlled input could lead to XSS.

🟡 Potential Cross-Site Scripting in dojo/templatetags/display_tags.py (drs_49e90666)
Vulnerability Potential Cross-Site Scripting
Description The code marks nh3.clean() output as safe with mark_safe before rendering. mark_safe disables Django's auto-escaping, so the security of the rendered HTML entirely depends on nh3.clean() and the provided allowlists. The PR introduces small, custom allowlists (_NH3_ALLOWED_TAGS and _NH3_ALLOWED_ATTRIBUTES) which permit and other tags and attributes (e.g., href, target). If nh3.clean is misconfigured, incomplete, or allows dangerous attribute values (like javascript: URIs in href) or misses event handler attributes, an attacker who can control the input could inject executable content. The banner code comment notes only admins can edit the login banner, but other uses (markdown_render and bleach_with_a_tags filter) apply mark_safe to cleaned user content; unless nh3.clean enforces context-appropriate encoding (and validates attribute schemes like href), this pattern is risky and can lead to XSS.

"markdown.extensions.fenced_code",
"markdown.extensions.toc",
"markdown.extensions.tables"])
return mark_safe(nh3.clean(markdown_text, tags=markdown_tags, attributes=markdown_attrs))
return None
@register.filter
def bleach_with_a_tags(message):
return mark_safe(nh3.clean(message, tags=_NH3_ALLOWED_TAGS, attributes=_NH3_ALLOWED_ATTRIBUTES))
def text_shortener(value, length):

🟡 Potential Cross-Site Scripting in dojo/templatetags/get_banner.py (drs_c675ef32)
Vulnerability Potential Cross-Site Scripting
Description The code returns mark_safe(nh3.clean(value, tags=_NH3_ALLOWED_TAGS, attributes=_NH3_ALLOWED_ATTRIBUTES)) for banner_message which bypasses Django template escaping. Although nh3.clean is used, the patch defines a small explicit allowlist (_NH3_ALLOWED_TAGS and _NH3_ALLOWED_ATTRIBUTES) and then uses mark_safe on the cleaned value. If nh3.clean is not equivalent to a proven HTML sanitizer (e.g., DOMPurify/bleach) for all contexts, or its configuration allows dangerous attribute values (for example href="javascript:..." or event handlers) or the allowed attribute set or tag set is incomplete, user-controllable content could reach the rendered HTML unsafely resulting in XSS. The comment indicates the field is editable by admins only, but using mark_safe still exposes rendered pages to any admin-supplied HTML. No further context-appropriate encoding (e.g., attribute or JS context escaping) is applied.

return mark_safe(nh3.clean(value, tags=_NH3_ALLOWED_TAGS, attributes=_NH3_ALLOWED_ATTRIBUTES))
return value
except Exception:
return False


Comment to provide feedback on these findings.

Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]

Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing

All finding details can be found in the DryRun Security Dashboard.

@valentijnscholten valentijnscholten changed the title chore(deps): replace bleach with nh3 for HTML sanitization replace bleach with nh3 for HTML sanitization Mar 4, 2026
@github-actions github-actions bot added the docs label Mar 4, 2026
…link

- Use escape() when building HTML in create_bleached_link so attribute
  values are properly encoded before nh3 parses them (prevents raw tags
  in href/title when user-supplied content contains HTML)
- Add rel="noopener noreferrer" to all expected link strings in tests
  (nh3 automatically injects this on target="_blank" links)
- Replace exact-output XSS assertion with semantic safety checks
nh3/ammonia does not re-escape < in attribute values when re-serializing,
so passing escape()'d HTML through nh3.clean() still produced raw angle
brackets in href/title. The function constructs trusted HTML itself, so
nh3 is redundant here — escape() is sufficient and correct.

Also adds rel="noopener noreferrer" explicitly and updates tests to
match the new output including the exact XSS-escaped form.
@valentijnscholten valentijnscholten added this to the 2.57.0 milestone Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects_pro PRs that affect Pro and need a coordinated release/merge moment. docs integration_tests ui unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant