Skip to content

feat: add telemetry component#10

Open
pzanella wants to merge 1 commit intoqualabs:feat/observabilityfrom
pzanella:feat/observability
Open

feat: add telemetry component#10
pzanella wants to merge 1 commit intoqualabs:feat/observabilityfrom
pzanella:feat/observability

Conversation

@pzanella
Copy link
Copy Markdown

feat(observability): client-side metrics via telemetry bridge

Adds a full client-side observability layer to @moq/hang. The design keeps the library's core completely dependency-free by routing all metric calls through a thin bridge that backends can implement independently.

Architecture

hang/watch → bridge.ts (zero-dep) → <hang-telemetry> → OpenTelemetry → OTLP Collector
  • telemetry/bridge.ts — zero-dependency interface (TelemetryProvider) and module-level recording helpers. Events that arrive before a backend is registered are buffered and replayed on setProvider(). High-frequency events (frames, bytes) are aggregated in Maps to prevent unbounded growth.
  • telemetry/element.ts<hang-telemetry> web component backed by OpenTelemetry. The SDK is dynamically imported only when a valid endpoint attribute is present (zero bundle cost when omitted). Uses queueMicrotask + a boolean flag to coalesce rapid attribute changes into a single setup cycle. Calls meterProvider.getMeter() directly, bypassing the global singleton that would break on hot-reload reconnects.

Metrics emitted

Metric Type Description
moq_client_connections_total Counter Total sessions opened, labelled by transport
moq_client_active_connections UpDownCounter Currently live sessions
moq_client_startup_time_seconds Histogram Time to first decoded frame (custom buckets: 50ms–60s)
moq_client_frames_decoded_total Counter Decoded frames, labelled by codec and track type
moq_client_bytes_received_total Counter Encoded bytes received, labelled by track type
moq_client_bitrate_bytes_per_second ObservableGauge EMA-smoothed bitrate per track (α=0.3)
moq_client_stall_total Counter Video stall entries (fired once per stall, not per frame)

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the client-side observability of the @moq/hang library by integrating a new, flexible telemetry system. The design prioritizes modularity and performance, routing all metric events through a dependency-free bridge. This allows for independent backend implementations, such as the provided OpenTelemetry web component, which dynamically loads its SDK to prevent unnecessary bundle size increases. The changes provide granular insights into connection status, media decoding performance, and network characteristics, enabling better monitoring and debugging of the application.

Highlights

  • New Telemetry System: Introduced a comprehensive client-side observability layer (@moq/hang/telemetry) to capture metrics, replacing the previous minimal observability module.
  • Dependency-Free Bridge: Implemented a thin, zero-dependency telemetry/bridge.ts module that acts as an interface for metric calls, allowing backends to implement independently.
  • OpenTelemetry Web Component: Added a new <hang-telemetry> web component (telemetry/element.ts) that dynamically imports the OpenTelemetry SDK only when an endpoint attribute is present, ensuring zero bundle cost when telemetry is not used.
  • Detailed Metrics: The new system emits a variety of client-side metrics including total and active connections, startup time (time to first frame), decoded frames, received bytes, estimated bitrate, and video playback stalls.
  • Connection Transport Type: The Established connection interface and its implementations (Ietf.Connection, Lite.Connection) now explicitly store the transport type (webtransport or websocket), which is used by the telemetry system.
  • CORS Configuration Update: Updated the OpenTelemetry collector configuration to specify allowed CORS origins and headers for the OTLP HTTP receiver.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new telemetry component for client-side metrics, leveraging OpenTelemetry. The changes involve refactoring the existing observability logic into a new zero-dependency bridge and a web component, enhancing modularity and flexibility. The integration seems well-thought-out, with event buffering and dynamic SDK loading to minimize overhead. Additionally, the CORS configuration for the OTLP collector has been tightened, improving security. Overall, this is a significant and positive enhancement to the project's observability capabilities.

type BytesEntry = { bytes: number; trackType: "video" | "audio"; attrs: Record<string, string> | undefined };

// Safety cap: avoid unbounded growth if many connections happen before a provider loads.
const MAX_PENDING = 50;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MAX_PENDING constant is a magic number. While its purpose is explained in the comments, consider making this value configurable, perhaps through an environment variable or a parameter, if there's a scenario where different buffering limits might be desired. This would improve flexibility without hardcoding a potentially arbitrary limit.

recordConnection(transport: "webtransport" | "websocket", attrs?: Record<string, string>): void;

/** Called when the first frame is decoded. `ms` is time since playback started. */
recordStartupTime(ms: number, attrs?: Record<string, string>): void;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The recordStartupTime in the TelemetryProvider interface and its implementation in bridge.ts expects ms (milliseconds). However, the corresponding metric moq_client_startup_time_seconds (defined in element.ts) implies a unit of seconds, and the element.ts implementation converts milliseconds to seconds before recording. To maintain consistency and clarity, it would be better if the TelemetryProvider interface and the recordStartupTime function in bridge.ts also expected seconds, aligning with the metric's unit.

Alternatively, if milliseconds are truly intended for the bridge, the metric name should reflect that (e.g., moq_client_startup_time_milliseconds). Given the current metric name, expecting seconds seems more appropriate.

Suggested change
recordStartupTime(ms: number, attrs?: Record<string, string>): void;
recordStartupTime(seconds: number, attrs?: Record<string, string>): void;

}

/** Record time-to-first-frame. `ms` is milliseconds from subscription start to first output. */
export function recordStartupTime(ms: number, attrs?: Record<string, string>): void {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Following the feedback for the TelemetryProvider interface, this function should also accept seconds instead of ms for consistency with the metric unit. The conversion from milliseconds to seconds should ideally happen at the call site in audio/source.ts and video/source.ts if the source provides milliseconds, or the source should directly provide seconds.

Suggested change
export function recordStartupTime(ms: number, attrs?: Record<string, string>): void {
export function recordStartupTime(seconds: number, attrs?: Record<string, string>): void {

tryFlush();
},
recordStartupTime(ms, attrs) {
startupTime.record(ms / 1000, attrs);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Given the moq_client_startup_time_seconds metric name, it's appropriate to record in seconds. However, if the recordStartupTime function in bridge.ts is updated to accept seconds directly (as suggested in bridge.ts), then this division by 1000 would no longer be necessary here. The ms parameter would already be in seconds.

Suggested change
startupTime.record(ms / 1000, attrs);
startupTime.record(ms, attrs);

firstFrameDecoded = true;
const ttfaSeconds = (performance.now() - trackStartTime) / 1000;
recordMetric((m) => m.recordStartupTime(ttfaSeconds, { codec: config.codec, track_type: "audio" }));
recordStartupTime(ttfaSeconds * 1000, { codec: config.codec, track_type: "audio" });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If recordStartupTime in telemetry/bridge.ts is updated to accept seconds (as suggested), then ttfaSeconds should be passed directly here, without multiplying by 1000. This ensures consistency across the telemetry system where the metric unit is seconds.

Suggested change
recordStartupTime(ttfaSeconds * 1000, { codec: config.codec, track_type: "audio" });
recordStartupTime(ttfaSeconds, { codec: config.codec, track_type: "audio" });

firstFrameRendered = true;
const ttffSeconds = (performance.now() - trackStartTime) / 1000;
recordMetric((m) => m.recordStartupTime(ttffSeconds, { codec: config.codec, track_type: "video" }));
recordStartupTime(ttffSeconds * 1000, { codec: config.codec, track_type: "video" });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the audio source, if recordStartupTime in telemetry/bridge.ts is updated to accept seconds, then ttffSeconds should be passed directly here, without multiplying by 1000. This maintains consistency with the metric's unit.

Suggested change
recordStartupTime(ttffSeconds * 1000, { codec: config.codec, track_type: "video" });
recordStartupTime(ttffSeconds, { codec: config.codec, track_type: "video" });

Comment on lines +8 to +11
- "http://localhost:5173"
allowed_headers:
- "*"
- "Content-Type"
- "Accept"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Changing allowed_origins and allowed_headers from "*" to specific values ("http://localhost:5173", "Content-Type", "Accept") is a good security practice. This follows the principle of least privilege and reduces the attack surface by preventing unauthorized cross-origin requests and header manipulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant