-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Description
What variant of Codex are you using?
CLI (v0.106.0)
What feature would you like to see?
Summary
Codex's exec policy engine (codex-rs/execpolicy/) evaluates shell commands against allow/deny rules using pattern matching and Starlark scripting. The shell-escalation crate handles privilege escalation detection, and the sandbox layer enforces filesystem access boundaries. None of these layers normalise Unicode before comparison, which means a crafted command or file path containing visually identical confusable characters could bypass policy checks.
For example:
- A command containing Cyrillic
с(U+0441) instead of Latinc(U+0063) in a binary name would not match a deny rule targeting that binary, but would render identically in the terminal. - A file path using confusable characters could evade filesystem sandbox boundaries that rely on string prefix matching.
This is a known attack class in package registries (typosquatting via homoglyphs) and identity systems (IDN homograph attacks). The same principle applies wherever security rules match against user-visible strings.
Proposed solution
Canonicalise both the policy rule and the command/path argument before comparison. Two levels:
- Unicode NFKC normalisation as a baseline (collapses compatibility equivalents). Rust's
unicode-normalizationcrate handles this. - Skeleton matching (Unicode TR39) to catch cross-script confusables that NFKC misses.
- Mixed-script detection to flag or reject inputs combining characters from multiple Unicode scripts.
For the confusable detection data, confusable-vision provides weight data covering 1,397 confusable pairs (793 not in TR39), discovered by rendering 26.5M character pairs across 230 fonts and measuring structural similarity. The companion library namespace-guard (MIT, TypeScript) demonstrates the detection logic, though a Rust implementation would be needed for codex-rs.
Affected components
execpolicycrate: command pattern matching against allow/deny rulesshell-escalationcrate: privilege escalation keyword detectionshell-commandcrate: command parsing and argument extraction- Filesystem sandbox path matching (prefix-based boundary enforcement)
secretscrate: if it matches against known secret patterns in command output
Alternatives considered
- NFC/NFD normalisation only: insufficient; cross-script confusables like Cyrillic а vs Latin a are already in NFC.
- Rejecting all non-ASCII in commands: too aggressive; legitimate use cases include file paths with accented characters, CJK filenames, and internationalised tool names.
- Warning instead of blocking: a reasonable first step; log a warning when mixed-script content appears in security-critical inputs.
Priority and scope
This is defence-in-depth. Exploiting it requires either a malicious prompt injection that crafts confusable commands/paths, or pre-existing files with confusable names. The attack surface is real but narrow. A warning-based approach would be proportionate as a first step.
Additional information
No existing issues on this repo cover Unicode confusable attacks on the exec policy system. Searched for "unicode confusable", "homoglyph", and "policy bypass" on 2026-02-28; existing issues relate to general Unicode rendering/encoding, not security-critical string matching.
Analysis performed on the open source codex-rs/ directory. Relevant crates: execpolicy, shell-escalation, shell-command, secrets, linux-sandbox.
Environment: macOS Darwin 25.2.0 arm64, Codex CLI v0.106.0