Skip to content

Unicode confusable characters can bypass exec policy matching #13095

@paultendo

Description

@paultendo

What variant of Codex are you using?

CLI (v0.106.0)

What feature would you like to see?

Summary

Codex's exec policy engine (codex-rs/execpolicy/) evaluates shell commands against allow/deny rules using pattern matching and Starlark scripting. The shell-escalation crate handles privilege escalation detection, and the sandbox layer enforces filesystem access boundaries. None of these layers normalise Unicode before comparison, which means a crafted command or file path containing visually identical confusable characters could bypass policy checks.

For example:

  • A command containing Cyrillic с (U+0441) instead of Latin c (U+0063) in a binary name would not match a deny rule targeting that binary, but would render identically in the terminal.
  • A file path using confusable characters could evade filesystem sandbox boundaries that rely on string prefix matching.

This is a known attack class in package registries (typosquatting via homoglyphs) and identity systems (IDN homograph attacks). The same principle applies wherever security rules match against user-visible strings.

Proposed solution

Canonicalise both the policy rule and the command/path argument before comparison. Two levels:

  1. Unicode NFKC normalisation as a baseline (collapses compatibility equivalents). Rust's unicode-normalization crate handles this.
  2. Skeleton matching (Unicode TR39) to catch cross-script confusables that NFKC misses.
  3. Mixed-script detection to flag or reject inputs combining characters from multiple Unicode scripts.

For the confusable detection data, confusable-vision provides weight data covering 1,397 confusable pairs (793 not in TR39), discovered by rendering 26.5M character pairs across 230 fonts and measuring structural similarity. The companion library namespace-guard (MIT, TypeScript) demonstrates the detection logic, though a Rust implementation would be needed for codex-rs.

Affected components

  • execpolicy crate: command pattern matching against allow/deny rules
  • shell-escalation crate: privilege escalation keyword detection
  • shell-command crate: command parsing and argument extraction
  • Filesystem sandbox path matching (prefix-based boundary enforcement)
  • secrets crate: if it matches against known secret patterns in command output

Alternatives considered

  • NFC/NFD normalisation only: insufficient; cross-script confusables like Cyrillic а vs Latin a are already in NFC.
  • Rejecting all non-ASCII in commands: too aggressive; legitimate use cases include file paths with accented characters, CJK filenames, and internationalised tool names.
  • Warning instead of blocking: a reasonable first step; log a warning when mixed-script content appears in security-critical inputs.

Priority and scope

This is defence-in-depth. Exploiting it requires either a malicious prompt injection that crafts confusable commands/paths, or pre-existing files with confusable names. The attack surface is real but narrow. A warning-based approach would be proportionate as a first step.

Additional information

No existing issues on this repo cover Unicode confusable attacks on the exec policy system. Searched for "unicode confusable", "homoglyph", and "policy bypass" on 2026-02-28; existing issues relate to general Unicode rendering/encoding, not security-critical string matching.

Analysis performed on the open source codex-rs/ directory. Relevant crates: execpolicy, shell-escalation, shell-command, secrets, linux-sandbox.

Environment: macOS Darwin 25.2.0 arm64, Codex CLI v0.106.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLIIssues related to the Codex CLIbugSomething isn't workingsandboxIssues related to permissions or sandboxing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions