feat: export iroh relay metrics via OpenTelemetry#489
feat: export iroh relay metrics via OpenTelemetry#489ThetaSinner wants to merge 1 commit intomainfrom
Conversation
Deploying kitsune2 with
|
| Latest commit: |
a2ca824
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://80531d6b.kitsune2.pages.dev |
| Branch Preview URL: | https://iroh-relay-otel-metrics.kitsune2.pages.dev |
|
The following will be added to the changelog [0.5.0-dev.1] - 2026-04-16Features
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughWorkspace OpenTelemetry dependencies were consolidated and expanded. The bootstrap server gained a public Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/bootstrap_srv/src/http.rs (1)
300-326:⚠️ Potential issue | 🔴 CriticalKeep the relay metrics guard alive for the server lifetime.
register()returns a guard whose callbacks stay active only while that value is alive. This binding is scoped to the setup block and is dropped at Line 327, before any server future is awaited, so the relay observers are unregistered during startup. Please lift the guard to the outer async scope or store it onServer/Ready.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/bootstrap_srv/src/http.rs` around lines 300 - 326, The relay metrics guard returned by crate::iroh_relay_metrics::register(relay_state.metrics.clone()) is created in a short-lived block and assigned to the local relay_otel_metrics binding which is dropped at the end of that block, causing observers to unregister during startup; move the guard into a longer-lived scope by returning or storing it on a longer-lived object (for example add a field to your Server or Ready struct like relay_otel_metrics or push it into an outer Vec of guards) so the value returned by register() remains alive for the server lifetime; update the setup code that currently defines relay_otel_metrics inside the inner block (and any uses of create_relay_state/create_relay_state_with_allowlist) to instead assign the guard to the new Server/Ready storage or an outer-scope variable so the guard is not dropped on function exit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/bootstrap_srv/src/metrics.rs`:
- Line 20: Replace the current tracing::info log that emits the raw OTLP
endpoint (tracing::info!("Enabling OpenTelemetry metrics export to {endpoint}"))
so it no longer logs operator-supplied secrets; either log a generic message
like "Enabling OpenTelemetry metrics export" or redact the endpoint to only
scheme/host before including it. Update the logging call in the metrics
initialization code (the tracing::info invocation that references endpoint) to
avoid outputting the full URL with credentials or query tokens.
In `@crates/bootstrap_srv/tests/relay_metrics.rs`:
- Around line 37-47: The test currently bypasses production wiring by calling
iroh_relay_axum::create_relay_state and iroh_relay_metrics::register and
hand-building Router with relay_handler; instead, construct the server through
the production bootstrap HTTP wiring (the module that builds/starts the real
HTTP app) so the metrics guard is installed and retained exactly as in prod;
replace the manual Router/metrics/register calls with a call to the bootstrap
HTTP entrypoint that returns or starts the App/Server used in production (the
same code that lives in the bootstrap HTTP module) and exercise that server in
the test so the real installation/lifetime of the metrics guard is covered.
---
Outside diff comments:
In `@crates/bootstrap_srv/src/http.rs`:
- Around line 300-326: The relay metrics guard returned by
crate::iroh_relay_metrics::register(relay_state.metrics.clone()) is created in a
short-lived block and assigned to the local relay_otel_metrics binding which is
dropped at the end of that block, causing observers to unregister during
startup; move the guard into a longer-lived scope by returning or storing it on
a longer-lived object (for example add a field to your Server or Ready struct
like relay_otel_metrics or push it into an outer Vec of guards) so the value
returned by register() remains alive for the server lifetime; update the setup
code that currently defines relay_otel_metrics inside the inner block (and any
uses of create_relay_state/create_relay_state_with_allowlist) to instead assign
the guard to the new Server/Ready storage or an outer-scope variable so the
guard is not dropped on function exit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f796de26-013a-4b63-aa60-7c707880f031
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
Cargo.tomlcrates/bootstrap_srv/Cargo.tomlcrates/bootstrap_srv/src/bin/kitsune2-bootstrap-srv.rscrates/bootstrap_srv/src/http.rscrates/bootstrap_srv/src/iroh_relay_metrics.rscrates/bootstrap_srv/src/lib.rscrates/bootstrap_srv/src/metrics.rscrates/bootstrap_srv/tests/relay_metrics.rs
3c104bc to
42386bd
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/bootstrap_srv/tests/relay_metrics.rs (1)
37-47:⚠️ Potential issue | 🟠 MajorRoute this test through the production bootstrap HTTP wiring.
Line 37 through Line 47 still build relay state/router manually, so this test can pass even if production wiring in
crates/bootstrap_srv/src/http.rsregresses (especially metrics guard install/retention). Please instantiate the server through the same bootstrap HTTP entrypoint used in production.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/bootstrap_srv/tests/relay_metrics.rs` around lines 37 - 47, The test currently builds the relay state and Router manually via create_relay_state, iroh_relay_metrics::register and Router::new/route/with_state, which bypasses the production wiring in crates/bootstrap_srv/src/http.rs; replace that manual setup by calling the production HTTP entrypoint provided in the bootstrap_srv http module (the function that constructs/returns the production Router or starts the server) so the test uses the same wiring (including metrics guard installation/retention) as production—remove the manual create_relay_state/iroh_relay_metrics::register/Router::new logic and instantiate the server/router through the HTTP entrypoint from the bootstrap_srv::http module instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/bootstrap_srv/tests/relay_metrics.rs`:
- Around line 37-47: The test currently builds the relay state and Router
manually via create_relay_state, iroh_relay_metrics::register and
Router::new/route/with_state, which bypasses the production wiring in
crates/bootstrap_srv/src/http.rs; replace that manual setup by calling the
production HTTP entrypoint provided in the bootstrap_srv http module (the
function that constructs/returns the production Router or starts the server) so
the test uses the same wiring (including metrics guard installation/retention)
as production—remove the manual
create_relay_state/iroh_relay_metrics::register/Router::new logic and
instantiate the server/router through the HTTP entrypoint from the
bootstrap_srv::http module instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0caf8b45-572b-40f3-ba44-0c8bc595e9dc
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
Cargo.tomlcrates/bootstrap_srv/Cargo.tomlcrates/bootstrap_srv/src/bin/kitsune2-bootstrap-srv.rscrates/bootstrap_srv/src/http.rscrates/bootstrap_srv/src/iroh_relay_metrics.rscrates/bootstrap_srv/src/lib.rscrates/bootstrap_srv/src/metrics.rscrates/bootstrap_srv/tests/relay_metrics.rs
✅ Files skipped from review due to trivial changes (1)
- crates/bootstrap_srv/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (5)
- crates/bootstrap_srv/src/lib.rs
- Cargo.toml
- crates/bootstrap_srv/src/metrics.rs
- crates/bootstrap_srv/src/http.rs
- crates/bootstrap_srv/src/iroh_relay_metrics.rs
e9035fb to
37206ef
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/bootstrap_srv/tests/relay_metrics.rs (1)
1-150: Move this end-to-end test up tokitsune2.This is a real product-wiring test: it boots
BootstrapSrv, opens actual relay connections, and validates exported OTEL metrics end to end. That coverage is better owned by the top-levelkitsune2crate; if you want to keep crate-local coverage here, keep it unit-scoped around the relay-metrics bridge and move the full-stack flow upward.Based on learnings, Place genuine integration tests using real production modules in the
kitsune2crate, not in lower-level crates. As per coding guidelines, "Prioritize unit tests as the default testing layer, covering as much as possible at this level".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/bootstrap_srv/tests/relay_metrics.rs` around lines 1 - 150, The end-to-end test relay_metrics_exported_via_otel boots the production BootstrapSrv and validates OTEL metrics and should be moved from this lower-level crate into the top-level kitsune2 integration tests; remove this test from crates/bootstrap_srv/tests/relay_metrics.rs and add an equivalent integration test under the kitsune2 crate (e.g., tests/relay_metrics.rs) that imports BootstrapSrv, Config, RelayUrl, RelayMap, Endpoint, etc., preserving the same logic and feature gating (#![cfg(feature = "iroh-relay")]) and ensuring the test uses the production wiring (BootstrapSrv::new) and the same OTEL setup; if you want to keep crate-local coverage here, replace the removed test with a smaller unit-scoped test that targets only the relay-metrics bridge functions (call sites/types: BootstrapSrv, Config, RelayMap, RelayUrl, relay_metrics_exported_via_otel) rather than full end-to-end server startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/bootstrap_srv/src/metrics.rs`:
- Around line 33-40: The enable_otlp_metrics() helper currently installs the
SdkMeterProvider globally but returns Result<()> so callers cannot flush pending
metric batches; change enable_otlp_metrics() to return the created
opentelemetry_sdk::metrics::SdkMeterProvider (or a guard/handle) instead of
unit, keep calling global::set_meter_provider(meter_provider.clone()) or set the
provider then return the provider/guard, and update the caller to accept and
store that returned provider so it can call provider.force_flush() (or
provider.shutdown()/force_flush as appropriate) during graceful shutdown to
flush periodic exporter batches before process exit; refer to
enable_otlp_metrics, SdkMeterProvider, global::set_meter_provider, and
force_flush/shutdown in your changes.
---
Nitpick comments:
In `@crates/bootstrap_srv/tests/relay_metrics.rs`:
- Around line 1-150: The end-to-end test relay_metrics_exported_via_otel boots
the production BootstrapSrv and validates OTEL metrics and should be moved from
this lower-level crate into the top-level kitsune2 integration tests; remove
this test from crates/bootstrap_srv/tests/relay_metrics.rs and add an equivalent
integration test under the kitsune2 crate (e.g., tests/relay_metrics.rs) that
imports BootstrapSrv, Config, RelayUrl, RelayMap, Endpoint, etc., preserving the
same logic and feature gating (#![cfg(feature = "iroh-relay")]) and ensuring the
test uses the production wiring (BootstrapSrv::new) and the same OTEL setup; if
you want to keep crate-local coverage here, replace the removed test with a
smaller unit-scoped test that targets only the relay-metrics bridge functions
(call sites/types: BootstrapSrv, Config, RelayMap, RelayUrl,
relay_metrics_exported_via_otel) rather than full end-to-end server startup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8d3a0cd8-14be-43d1-ae4d-d1d7251d9e63
📒 Files selected for processing (3)
crates/bootstrap_srv/src/http.rscrates/bootstrap_srv/src/metrics.rscrates/bootstrap_srv/tests/relay_metrics.rs
2ebfab0 to
2bb1498
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/bootstrap_srv/src/metrics.rs`:
- Around line 15-17: The rustdoc intra-doc links to SdkMeterProvider and
SdkMeterProvider::force_flush are unresolved; update the doc comment to use
either fully-qualified paths (e.g. opentelemetry_sdk::SdkMeterProvider and
opentelemetry_sdk::SdkMeterProvider::force_flush) or convert the links to plain
code spans (`SdkMeterProvider`, `SdkMeterProvider::force_flush`) so rustdoc can
resolve them and the docs build without warnings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4cef3817-af94-4fd3-af0c-18b135bc2ebe
📒 Files selected for processing (4)
crates/bootstrap_srv/src/bin/kitsune2-bootstrap-srv.rscrates/bootstrap_srv/src/http.rscrates/bootstrap_srv/src/metrics.rscrates/bootstrap_srv/tests/relay_metrics.rs
✅ Files skipped from review due to trivial changes (1)
- crates/bootstrap_srv/src/http.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/bootstrap_srv/src/bin/kitsune2-bootstrap-srv.rs
- crates/bootstrap_srv/tests/relay_metrics.rs
2bb1498 to
ea2c890
Compare
Bridge iroh-relay-holochain metrics (bytes_sent, bytes_recv, accepts, disconnects, unique_client_keys, packets dropped, rate limiting) to OTEL using observable counters that read the iroh-metrics atomics at export time. Move OTEL meter provider initialization out of the SBD feature gate so metrics export works regardless of which relay backend is active. Add integration test that verifies the full chain: relay handles client traffic, iroh-metrics atomics increment, OTEL exporter captures the values.
17983f2 to
a2ca824
Compare
|
✔️ a2ca824 - Conventional commits check succeeded. |
I notice that we're actually very short on metrics for the bootstrap service. When metrics were first integrated here, it was really only for SBD. So we now expose the Iroh metrics using opentelemetry but we could do with following up to add more bootstrap metrics.
Summary
bytes_sent,bytes_recv,accepts,disconnects,unique_client_keys,send_packets_dropped,other_packets_dropped,bytes_rx_ratelimited,conns_rx_ratelimitedsbdfeature gate into a feature-independentmetricsmodule, so--otlp_endpointworks with the iroh-relay backendTest plan
cargo make staticpasses (bothiroh-relayandsbdfeatures)relay_metrics_exported_via_otelpasses--otlp_endpointand verify metrics appear in the collector