Conversation
…r reporting - Add configurable connection_timeout (default 120s, 0 to disable) to DatabaseConfig - Fix ConnectionTimeout error message to display milliseconds (was showing seconds) - Map ConnectionTimeout to PostgreSQL error code 57P05 instead of generic 58000 - Send ErrorResponse to client on timeout via channel writer - Add pull_policy: never to docker-compose proxy services to ensure local builds - Add connect_timeout to integration test client config - Add unit tests for config defaults, error message format, and error code mapping - Add integration tests for connection isolation under load - Document local build requirement in DEVELOPMENT.md
- Pass actual timeout duration in ErrorResponse message to client - Document rationale for using 57P05 over 08006 error code - Document yield_now() as best-effort flush with known limitation - Document advisory lock test race condition timing margin - Replace magic number 12345 with named ADVISORY_LOCK_ID constant
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds connection-timeout logic throughout the proxy (default 120s; 0 disables), surfaces timeout errors as PostgreSQL 57P05 with millisecond messages, introduces connection-resilience integration tests, and updates build/test docs and Docker build tagging to require locally built proxy images for testing. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Proxy
participant DB
Client->>Proxy: Open connection / send startup
Proxy->>DB: Attempt connect/read (with connection timeout)
alt DB responds before timeout
DB->>Proxy: Response
Proxy->>Client: Continue protocol
else ConnectionTimeout occurs
DB--xProxy: no response (timeout)
Proxy->>Proxy: Build 57P05 ErrorResponse (msg in ms)
Proxy->>Client: Send connection timeout ErrorResponse
Proxy->>Proxy: Flush and teardown connection
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/cipherstash-proxy-integration/src/connection_resilience.rs`:
- Around line 145-168: The readiness Notify (a_ready/a_ready_tx) can fire before
client_b is actually blocked, making the test flaky; replace the notify_one() +
fixed sleep logic with a deterministic wait that polls Postgres (e.g. pg_locks
or pg_stat_activity) on the direct connection until the proxy connection is
observed waiting for the advisory lock. Concretely: stop using
a_ready/a_ready_tx as the sole signal; after spawning b_handle (which uses
connect_with_tls(PROXY) and executes b_lock_query), poll from the direct
Postgres client (the direct connection used earlier) by running a query against
pg_locks/pg_stat_activity and check for client_b’s PID/status showing it is
waiting for the advisory lock, and only then proceed to measure connection C;
ensure the polling loop has a sensible timeout and fails the test if the wait
state never appears.
In `@packages/cipherstash-proxy/src/postgresql/handler.rs`:
- Around line 262-276: The ConnectionTimeout path must emit the ErrorResponse
not only after tokio::try_join! but also for early returns before the stream
split (the timeout occurrences at the earlier startup/auth branches referenced
in the review). Add a small helper (e.g.,
emit_connection_timeout(timeout_sender, err)) that builds
ErrorResponse::connection_timeout(err.to_string()), attempts
BytesMut::try_from(...), sends via timeout_sender, and does a
tokio::task::yield_now().await; then call that helper in the pre-split timeout
return sites (the branches that currently return Error::ConnectionTimeout around
the startup/password-exchange logic) instead of returning immediately, or move
the timeout-emission into a centralized early-return path so all
Error::ConnectionTimeouts use the same emission logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d8410d5b-183a-44fe-9ca1-63f8f6ea7dda
📒 Files selected for processing (11)
DEVELOPMENT.mdmise.tomlpackages/cipherstash-proxy-integration/src/common.rspackages/cipherstash-proxy-integration/src/connection_resilience.rspackages/cipherstash-proxy-integration/src/lib.rspackages/cipherstash-proxy/src/config/database.rspackages/cipherstash-proxy/src/error.rspackages/cipherstash-proxy/src/postgresql/error_handler.rspackages/cipherstash-proxy/src/postgresql/handler.rspackages/cipherstash-proxy/src/postgresql/messages/error_response.rstests/docker-compose.yml
packages/cipherstash-proxy-integration/src/connection_resilience.rs
Outdated
Show resolved
Hide resolved
metrics 0.24.1 has a lifetime issue (rust-lang/rust#141402) that causes compilation failures on recent Rust versions.
468f4f2 to
eec033f
Compare
Replace Notify + fixed sleep in advisory lock test with pg_locks polling for deterministic synchronization. Send ErrorResponse to clients for pre-split connection timeouts instead of silent disconnect.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
packages/cipherstash-proxy-integration/src/connection_resilience.rs (2)
16-18:⚠️ Potential issue | 🟠 MajorTie the advisory-lock poll to connection B, not just the lock ID.
Right now
poll_queryonly proves that some backend is waiting onADVISORY_LOCK_ID. Because the key is hard-coded at Line 18 andclient_bis created inside the task, another concurrent suite can satisfy this poll before this test's session is actually blocked. Createclient_bbeforetokio::spawn, capture its backend PID, and filterpg_locksby that PID (or generate a per-test lock key) so the readiness gate stays specific to this test.In PostgreSQL, can `pg_locks` be filtered by backend `pid` to identify the session waiting on a specific advisory lock, and how are `pg_advisory_lock(bigint)` keys represented in `pg_locks`?Also applies to: 141-158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/connection_resilience.rs` around lines 16 - 18, The readiness poll currently checks pg_locks only by the hard-coded ADVISORY_LOCK_ID (const ADVISORY_LOCK_ID) which can match a different test's session; create the client used to acquire the lock (client_b) before calling tokio::spawn so you can obtain its backend PID, then update poll_query to filter pg_locks by that PID (or alternatively generate a unique per-test ADVISORY_LOCK_ID) so the poll verifies that the specific client_b session is blocked on pg_advisory_lock rather than any session using the same numeric key; adjust the code paths that call poll_query (and the spawned task that runs the lock) to use the captured PID or unique key accordingly.
32-33:⚠️ Potential issue | 🟠 MajorReplace these sleep-based readiness gates with deterministic checks.
Line 33, Line 66, and Line 93 are still using fixed delays to guess when the previous state change has happened. That means these tests can pass before the slow workload is actually active, or only fail because teardown/startup took longer on CI. Please gate these assertions on an observable condition instead, similar to the
pg_lockspolling used later in this file, or remove the grace period entirely in the disconnect test so it exercises the immediate reconnect path.Also applies to: 65-66, 92-93
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/connection_resilience.rs` around lines 32 - 33, The fixed-duration tokio::time::sleep(Duration::from_millis(200)).await calls are nondeterministic readiness gates; replace them with deterministic polling of an observable condition (e.g., query the same pg_locks check used later in this file or poll the proxy/connection state) so the test waits until the expected lock/worker state is actually present before proceeding; alternatively, for the disconnect test remove the grace period entirely and assert the immediate reconnect path by checking the connection became active/inactive via the observable (use the same pg_locks or connection-status query code path already in this file rather than sleeping).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/cipherstash-proxy-integration/src/connection_resilience.rs`:
- Around line 16-18: The readiness poll currently checks pg_locks only by the
hard-coded ADVISORY_LOCK_ID (const ADVISORY_LOCK_ID) which can match a different
test's session; create the client used to acquire the lock (client_b) before
calling tokio::spawn so you can obtain its backend PID, then update poll_query
to filter pg_locks by that PID (or alternatively generate a unique per-test
ADVISORY_LOCK_ID) so the poll verifies that the specific client_b session is
blocked on pg_advisory_lock rather than any session using the same numeric key;
adjust the code paths that call poll_query (and the spawned task that runs the
lock) to use the captured PID or unique key accordingly.
- Around line 32-33: The fixed-duration
tokio::time::sleep(Duration::from_millis(200)).await calls are nondeterministic
readiness gates; replace them with deterministic polling of an observable
condition (e.g., query the same pg_locks check used later in this file or poll
the proxy/connection state) so the test waits until the expected lock/worker
state is actually present before proceeding; alternatively, for the disconnect
test remove the grace period entirely and assert the immediate reconnect path by
checking the connection became active/inactive via the observable (use the same
pg_locks or connection-status query code path already in this file rather than
sleeping).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0b6eefc7-81f1-422b-bca9-856c9391b822
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
packages/cipherstash-proxy-integration/src/connection_resilience.rspackages/cipherstash-proxy/Cargo.tomlpackages/cipherstash-proxy/src/config/database.rspackages/cipherstash-proxy/src/postgresql/error_handler.rspackages/cipherstash-proxy/src/postgresql/handler.rs
freshtonic
left a comment
There was a problem hiding this comment.
Looks good but the tests are failing. I've added a comment with my hypothesis as to why.
| client_b | ||
| .simple_query(&b_lock_query) | ||
| .await | ||
| .unwrap(); |
There was a problem hiding this comment.
This test is failing and I think it's because there should a sleep in between acquiring and releasing so that the has_waiting check on line 160 has a chance of observing client_b holding the lock.
client_a was hardcoded to PG_PORT (5532) while the proxy may connect to a different PostgreSQL instance (e.g. postgres-tls:5617 in TLS CI). This meant the advisory lock was held on a different server than client_b was targeting, so the lock never blocked and the test was flaky in CI. Use get_database_port() to ensure client_a connects to the same PostgreSQL instance the proxy uses.
Acknowledgment
By submitting this pull request, I confirm that CipherStash can use, modify, copy, and redistribute this contribution, under the terms of CipherStash's choice.
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation
Chores