Unicode confusable characters can bypass exec policy matching

## What variant of Codex are you using?

CLI (v0.106.0)

## What feature would you like to see?

### Summary

Codex's exec policy engine (`codex-rs/execpolicy/`) evaluates shell commands against allow/deny rules using pattern matching and Starlark scripting. The `shell-escalation` crate handles privilege escalation detection, and the sandbox layer enforces filesystem access boundaries. None of these layers normalise Unicode before comparison, which means a crafted command or file path containing visually identical confusable characters could bypass policy checks.

For example:
- A command containing Cyrillic `с` (U+0441) instead of Latin `c` (U+0063) in a binary name would not match a deny rule targeting that binary, but would render identically in the terminal.
- A file path using confusable characters could evade filesystem sandbox boundaries that rely on string prefix matching.

This is a known attack class in package registries (typosquatting via homoglyphs) and identity systems (IDN homograph attacks). The same principle applies wherever security rules match against user-visible strings.

### Proposed solution

Canonicalise both the policy rule and the command/path argument before comparison. Two levels:

1. **Unicode NFKC normalisation** as a baseline (collapses compatibility equivalents). Rust's `unicode-normalization` crate handles this.
2. **Skeleton matching** (Unicode TR39) to catch cross-script confusables that NFKC misses.
3. **Mixed-script detection** to flag or reject inputs combining characters from multiple Unicode scripts.

For the confusable detection data, [confusable-vision](https://github.com/paultendo/confusable-vision) provides weight data covering 1,397 confusable pairs (793 not in TR39), discovered by rendering 26.5M character pairs across 230 fonts and measuring structural similarity. The companion library [namespace-guard](https://github.com/paultendo/namespace-guard) (MIT, TypeScript) demonstrates the detection logic, though a Rust implementation would be needed for codex-rs.

### Affected components

- `execpolicy` crate: command pattern matching against allow/deny rules
- `shell-escalation` crate: privilege escalation keyword detection
- `shell-command` crate: command parsing and argument extraction
- Filesystem sandbox path matching (prefix-based boundary enforcement)
- `secrets` crate: if it matches against known secret patterns in command output

### Alternatives considered

- **NFC/NFD normalisation only**: insufficient; cross-script confusables like Cyrillic а vs Latin a are already in NFC.
- **Rejecting all non-ASCII in commands**: too aggressive; legitimate use cases include file paths with accented characters, CJK filenames, and internationalised tool names.
- **Warning instead of blocking**: a reasonable first step; log a warning when mixed-script content appears in security-critical inputs.

### Priority and scope

This is defence-in-depth. Exploiting it requires either a malicious prompt injection that crafts confusable commands/paths, or pre-existing files with confusable names. The attack surface is real but narrow. A warning-based approach would be proportionate as a first step.

## Additional information

No existing issues on this repo cover Unicode confusable attacks on the exec policy system. Searched for "unicode confusable", "homoglyph", and "policy bypass" on 2026-02-28; existing issues relate to general Unicode rendering/encoding, not security-critical string matching.

Analysis performed on the open source `codex-rs/` directory. Relevant crates: `execpolicy`, `shell-escalation`, `shell-command`, `secrets`, `linux-sandbox`.

**Environment:** macOS Darwin 25.2.0 arm64, Codex CLI v0.106.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode confusable characters can bypass exec policy matching #13095

What variant of Codex are you using?

What feature would you like to see?

Summary

Proposed solution

Affected components

Alternatives considered

Priority and scope

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unicode confusable characters can bypass exec policy matching #13095

Description

What variant of Codex are you using?

What feature would you like to see?

Summary

Proposed solution

Affected components

Alternatives considered

Priority and scope

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions