Skip to content

Commit ce2ecbe

Browse files
authored
feat: record messages from user in ~/.codex/history.jsonl (#939)
This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
1 parent 3fdf9df commit ce2ecbe

File tree

15 files changed

+873
-15
lines changed

15 files changed

+873
-15
lines changed

codex-rs/Cargo.lock

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

codex-rs/README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@ This folder is the root of a Cargo workspace. It contains quite a bit of experim
2323

2424
## Config
2525

26-
The CLI can be configured via `~/.codex/config.toml`. It supports the following options:
26+
The CLI can be configured via a file named `config.toml`. By default, configuration is read from `~/.codex/config.toml`, though the `CODEX_HOME` environment variable can be used to specify a directory other than `~/.codex`.
27+
28+
The `config.toml` file supports the following options:
2729

2830
### model
2931

@@ -297,6 +299,17 @@ To have Codex use this script for notifications, you would configure it via `not
297299
notify = ["python3", "/Users/mbolin/.codex/notify.py"]
298300
```
299301

302+
### history
303+
304+
By default, Codex CLI records messages sent to the model in `$CODEX_HOME/history.jsonl`. Note that on UNIX, the file permissions are set to `o600`, so it should only be readable and writable by the owner.
305+
306+
To disable this behavior, configure `[history]` as follows:
307+
308+
```toml
309+
[history]
310+
persistence = "none" # "save-all" is the default value
311+
```
312+
300313
### project_doc_max_bytes
301314

302315
Maximum number of bytes to read from an `AGENTS.md` file to include in the instructions sent with the first turn of a session. Defaults to 32 KiB.

codex-rs/core/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ codex-mcp-client = { path = "../mcp-client" }
2020
dirs = "6"
2121
env-flags = "0.1.1"
2222
eventsource-stream = "0.2.3"
23+
fs2 = "0.4.3"
2324
fs-err = "3.1.0"
2425
futures = "0.3"
2526
mcp-types = { path = "../mcp-types" }

codex-rs/core/src/codex.rs

Lines changed: 58 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ impl Codex {
110110
cwd: config.cwd.clone(),
111111
};
112112

113+
let config = Arc::new(config);
113114
tokio::spawn(submission_loop(config, rx_sub, tx_event, ctrl_c));
114115
let codex = Codex {
115116
next_id: AtomicU64::new(0),
@@ -483,11 +484,14 @@ impl AgentTask {
483484
}
484485

485486
async fn submission_loop(
486-
config: Config,
487+
config: Arc<Config>,
487488
rx_sub: Receiver<Submission>,
488489
tx_event: Sender<Event>,
489490
ctrl_c: Arc<Notify>,
490491
) {
492+
// Generate a unique ID for the lifetime of this Codex session.
493+
let session_id = Uuid::new_v4();
494+
491495
let mut sess: Option<Arc<Session>> = None;
492496
// shorthand - send an event when there is no active session
493497
let send_no_session_event = |sub_id: String| async {
@@ -608,7 +612,9 @@ async fn submission_loop(
608612

609613
// Attempt to create a RolloutRecorder *before* moving the
610614
// `instructions` value into the Session struct.
611-
let session_id = Uuid::new_v4();
615+
// TODO: if ConfigureSession is sent twice, we will create an
616+
// overlapping rollout file. Consider passing RolloutRecorder
617+
// from above.
612618
let rollout_recorder =
613619
match RolloutRecorder::new(&config, session_id, instructions.clone()).await {
614620
Ok(r) => Some(r),
@@ -633,10 +639,19 @@ async fn submission_loop(
633639
rollout: Mutex::new(rollout_recorder),
634640
}));
635641

642+
// Gather history metadata for SessionConfiguredEvent.
643+
let (history_log_id, history_entry_count) =
644+
crate::message_history::history_metadata(&config).await;
645+
636646
// ack
637647
let events = std::iter::once(Event {
638648
id: sub.id.clone(),
639-
msg: EventMsg::SessionConfigured(SessionConfiguredEvent { session_id, model }),
649+
msg: EventMsg::SessionConfigured(SessionConfiguredEvent {
650+
session_id,
651+
model,
652+
history_log_id,
653+
history_entry_count,
654+
}),
640655
})
641656
.chain(mcp_connection_errors.into_iter());
642657
for event in events {
@@ -691,6 +706,46 @@ async fn submission_loop(
691706
other => sess.notify_approval(&id, other),
692707
}
693708
}
709+
Op::AddToHistory { text } => {
710+
let id = session_id;
711+
let config = config.clone();
712+
tokio::spawn(async move {
713+
if let Err(e) = crate::message_history::append_entry(&text, &id, &config).await
714+
{
715+
tracing::warn!("failed to append to message history: {e}");
716+
}
717+
});
718+
}
719+
720+
Op::GetHistoryEntryRequest { offset, log_id } => {
721+
let config = config.clone();
722+
let tx_event = tx_event.clone();
723+
let sub_id = sub.id.clone();
724+
725+
tokio::spawn(async move {
726+
// Run lookup in blocking thread because it does file IO + locking.
727+
let entry_opt = tokio::task::spawn_blocking(move || {
728+
crate::message_history::lookup(log_id, offset, &config)
729+
})
730+
.await
731+
.unwrap_or(None);
732+
733+
let event = Event {
734+
id: sub_id,
735+
msg: EventMsg::GetHistoryEntryResponse(
736+
crate::protocol::GetHistoryEntryResponseEvent {
737+
offset,
738+
log_id,
739+
entry: entry_opt,
740+
},
741+
),
742+
};
743+
744+
if let Err(e) = tx_event.send(event).await {
745+
tracing::warn!("failed to send GetHistoryEntryResponse event: {e}");
746+
}
747+
});
748+
}
694749
}
695750
}
696751
debug!("Agent loop exited");

codex-rs/core/src/config.rs

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,30 @@ pub struct Config {
8181
/// Directory containing all Codex state (defaults to `~/.codex` but can be
8282
/// overridden by the `CODEX_HOME` environment variable).
8383
pub codex_home: PathBuf,
84+
85+
/// Settings that govern if and what will be written to `~/.codex/history.jsonl`.
86+
pub history: History,
87+
}
88+
89+
/// Settings that govern if and what will be written to `~/.codex/history.jsonl`.
90+
#[derive(Deserialize, Debug, Clone, PartialEq, Default)]
91+
pub struct History {
92+
/// If true, history entries will not be written to disk.
93+
pub persistence: HistoryPersistence,
94+
95+
/// If set, the maximum size of the history file in bytes.
96+
/// TODO(mbolin): Not currently honored.
97+
pub max_bytes: Option<usize>,
98+
}
99+
100+
#[derive(Deserialize, Debug, Clone, PartialEq, Default)]
101+
#[serde(rename_all = "kebab-case")]
102+
pub enum HistoryPersistence {
103+
/// Save all history entries to disk.
104+
#[default]
105+
SaveAll,
106+
/// Do not write history to disk.
107+
None,
84108
}
85109

86110
/// Base config deserialized from ~/.codex/config.toml.
@@ -130,6 +154,10 @@ pub struct ConfigToml {
130154
/// Named profiles to facilitate switching between different configurations.
131155
#[serde(default)]
132156
pub profiles: HashMap<String, ConfigProfile>,
157+
158+
/// Settings that govern if and what will be written to `~/.codex/history.jsonl`.
159+
#[serde(default)]
160+
pub history: Option<History>,
133161
}
134162

135163
impl ConfigToml {
@@ -297,6 +325,8 @@ impl Config {
297325
}
298326
};
299327

328+
let history = cfg.history.unwrap_or_default();
329+
300330
let config = Self {
301331
model: model
302332
.or(config_profile.model)
@@ -320,6 +350,7 @@ impl Config {
320350
model_providers,
321351
project_doc_max_bytes: cfg.project_doc_max_bytes.unwrap_or(PROJECT_DOC_MAX_BYTES),
322352
codex_home,
353+
history,
323354
};
324355
Ok(config)
325356
}
@@ -468,6 +499,40 @@ mod tests {
468499
);
469500
}
470501

502+
#[test]
503+
fn test_toml_parsing() {
504+
let history_with_persistence = r#"
505+
[history]
506+
persistence = "save-all"
507+
"#;
508+
let history_with_persistence_cfg: ConfigToml =
509+
toml::from_str::<ConfigToml>(history_with_persistence)
510+
.expect("TOML deserialization should succeed");
511+
assert_eq!(
512+
Some(History {
513+
persistence: HistoryPersistence::SaveAll,
514+
max_bytes: None,
515+
}),
516+
history_with_persistence_cfg.history
517+
);
518+
519+
let history_no_persistence = r#"
520+
[history]
521+
persistence = "none"
522+
"#;
523+
524+
let history_no_persistence_cfg: ConfigToml =
525+
toml::from_str::<ConfigToml>(history_no_persistence)
526+
.expect("TOML deserialization should succeed");
527+
assert_eq!(
528+
Some(History {
529+
persistence: HistoryPersistence::None,
530+
max_bytes: None,
531+
}),
532+
history_no_persistence_cfg.history
533+
);
534+
}
535+
471536
/// Deserializing a TOML string containing an *invalid* permission should
472537
/// fail with a helpful error rather than silently defaulting or
473538
/// succeeding.
@@ -620,6 +685,7 @@ disable_response_storage = true
620685
model_providers: fixture.model_provider_map.clone(),
621686
project_doc_max_bytes: PROJECT_DOC_MAX_BYTES,
622687
codex_home: fixture.codex_home(),
688+
history: History::default(),
623689
},
624690
o3_profile_config
625691
);
@@ -654,6 +720,7 @@ disable_response_storage = true
654720
model_providers: fixture.model_provider_map.clone(),
655721
project_doc_max_bytes: PROJECT_DOC_MAX_BYTES,
656722
codex_home: fixture.codex_home(),
723+
history: History::default(),
657724
};
658725

659726
assert_eq!(expected_gpt3_profile_config, gpt3_profile_config);
@@ -703,6 +770,7 @@ disable_response_storage = true
703770
model_providers: fixture.model_provider_map.clone(),
704771
project_doc_max_bytes: PROJECT_DOC_MAX_BYTES,
705772
codex_home: fixture.codex_home(),
773+
history: History::default(),
706774
};
707775

708776
assert_eq!(expected_zdr_profile_config, zdr_profile_config);

codex-rs/core/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ pub mod landlock;
2424
mod mcp_connection_manager;
2525
pub mod mcp_server_config;
2626
mod mcp_tool_call;
27+
mod message_history;
2728
mod model_provider_info;
2829
pub use model_provider_info::ModelProviderInfo;
2930
pub use model_provider_info::WireApi;

0 commit comments

Comments
 (0)