Hypernova/webmon by api-Hypernova · Pull Request #4 · xmtp/protocol-e2e-tests

api-Hypernova · 2025-11-15T10:58:12Z

Add web health check metrics to Hypernova/webmon by installing `curl`, copying `web_healthcheck.sh` to `/usr/local/bin/web_healthcheck.sh`, and invoking it each loop in `newrunner.sh` with `MONITOR_LOOP_PAUSE` control

Introduce a web_healthcheck.sh script that runs curl checks on configured endpoints and pushes Prometheus metrics, add curl to the runtime image, and switch loop control to MONITOR_LOOP_PAUSE in docker/Dockerfile and xmtp_debug/newrunner.sh.

📍Where to Start

Start with the health check flow in xmtp_debug/web_healthcheck.sh, then review its invocation inside the loop in xmtp_debug/newrunner.sh.

📊 Macroscope summarized f89c4e3. 2 files reviewed, 9 issues evaluated, 5 issues filtered, 3 comments posted

🗂️ Filtered Issues

newrunner.sh — 0 comments posted, 3 evaluated, 2 filtered

line 3: Contract/regression: the environment variable controlling loop pauses was renamed from XDBG_LOOP_PAUSE to MONITOR_LOOP_PAUSE (line 3) and the child-process overrides were changed accordingly at lines 22, 23, 24, 29, 33, and 37. If callers or the xdbg tool expect XDBG_LOOP_PAUSE, the new script will ignore that input and xdbg may also ignore the MONITOR_LOOP_PAUSE override, changing behavior (e.g., failing to suppress internal pauses). The visible contract for controlling pause duration and the per-command override has changed without guard or compatibility. [ Low confidence ]
line 22: No error handling for external commands: all xdbg invocations (lines 22–25, 29, 33, 37) and the health check bash "$(dirname "$0")/web_healthcheck.sh" (line 39) are executed without checking exit status. Failures silently proceed to subsequent steps, potentially leaving the environment partially reset or tests running against a bad state. Use guards such as set -e, || exit, or && chaining to ensure a defined terminal outcome on failure. [ Low confidence ]

web_healthcheck.sh — 3 comments posted, 6 evaluated, 3 filtered

line 35: Health status treats only HTTP 200 as success. Healthy endpoints frequently return other 2xx codes (e.g., 204), or require following redirects (e.g., 301/302). Without -L, many endpoints will be marked unhealthy despite being reachable. Consider treating any 2xx as healthy and using curl -L when appropriate or making this configurable. [ Low confidence ]
line 54: PUSHGATEWAY_URL is not validated. If empty or malformed, push_url becomes invalid (e.g., starting with /metrics/...), and curl will fail or target a local file path. Add a guard to ensure PUSHGATEWAY_URL is a non-empty valid URL before pushing metrics. [ Low confidence ]
line 58: push_http_code extraction is fragile: grep "HTTP_CODE:" | cut -d: -f2 will return multiple lines if the response body contains HTTP_CODE: anywhere, causing ambiguous comparison and potential false negatives in success detection. Safer to read only the last line or use tail -n1 to select the appended status line. [ Low confidence ]

macroscopeapp · 2025-11-15T11:01:20Z

web_healthcheck.sh

+log "Checking ${#ENDPOINTS[@]} endpoint(s)..."
+
+for endpoint in "${ENDPOINTS[@]}"; do
+    endpoint=$(echo "$endpoint" | xargs)


Empty endpoints aren’t skipped after trim. This can call curl with no URL and create a blank endpoint_id (e.g., .../instance/). Consider continuing the loop when endpoint is empty.

Suggested change

endpoint=$(echo "$endpoint" | xargs)

endpoint=$(echo "$endpoint" | xargs)

if [ -z "$endpoint" ]; then

continue

fi

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

macroscopeapp · 2025-11-15T11:01:20Z

web_healthcheck.sh

+
+    metrics_payload="# HELP ${metric_name} Health status of web endpoints (1 = healthy, 0 = unhealthy)
+# TYPE ${metric_name} gauge
+${metric_name}{endpoint=\"${endpoint}\",endpoint_id=\"${endpoint_id}\",http_code=\"${http_code}\"} ${health_status}


Label values in metrics_payload are not escaped, so an endpoint containing \, " or newlines can produce invalid Prometheus exposition and be rejected by Pushgateway. Consider escaping backslashes, double-quotes and newlines for endpoint (and endpoint_id) before building the metric label string.

- metrics_payload="# HELP ${metric_name} Health status of web endpoints (1 = healthy, 0 = unhealthy) -# TYPE ${metric_name} gauge -${metric_name}{endpoint=\"${endpoint}\",endpoint_id=\"${endpoint_id}\",http_code=\"${http_code}\"} ${health_status} -" + # escape label values per Prometheus exposition format + escaped_endpoint=$(printf '%s' "$endpoint" | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e ':a;N;s/\n/\\n/;ta') + escaped_endpoint_id=$(printf '%s' "$endpoint_id" | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e ':a;N;s/\n/\\n/;ta') + metrics_payload="# HELP ${metric_name} Health status of web endpoints (1 = healthy, 0 = unhealthy) +# TYPE ${metric_name} gauge +${metric_name}{endpoint=\"${escaped_endpoint}\",endpoint_id=\"${escaped_endpoint_id}\",http_code=\"${http_code}\"} ${health_status} +"

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

macroscopeapp · 2025-11-15T11:01:20Z

web_healthcheck.sh

+
+: "${PUSHGATEWAY_URL:=http://localhost:9091}"
+: "${HEALTHCHECK_TIMEOUT:=10}"
+


HEALTHCHECK_TIMEOUT isn’t validated. If it’s non‑numeric or out of range, curl -m fails and health is misreported. Consider validating it as a positive integer within a sane range and defaulting when invalid.

+if ! [[ "$HEALTHCHECK_TIMEOUT" =~ ^[0-9]+$ ]] || [ "$HEALTHCHECK_TIMEOUT" -le 0 ] || [ "$HEALTHCHECK_TIMEOUT" -gt 300 ]; then + log "Invalid HEALTHCHECK_TIMEOUT: $HEALTHCHECK_TIMEOUT; using 10" + HEALTHCHECK_TIMEOUT=10 +fi

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

api-Hypernova added 8 commits November 12, 2025 03:33

Add web healthcheck

7dfbab3

Copy the healthcheck script properly

29e38c3

Debug - why isn't the metric showing up in pgw??

999f041

More debug

4f4932f

Try this!?

da82355

More fixes

b0bb3ab

Remove timestamp

611eb77

Working - cleanup for CR. Hopefully this doesnt break it

f89c4e3

api-Hypernova self-assigned this Nov 15, 2025

api-Hypernova merged commit fc3cd43 into main Nov 15, 2025
3 checks passed

macroscopeapp bot reviewed Nov 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypernova/webmon#4

Hypernova/webmon#4
api-Hypernova merged 8 commits intomainfrom
hypernova/webmon

api-Hypernova commented Nov 15, 2025 •

edited by macroscopeapp bot

Loading

Uh oh!

Uh oh!

macroscopeapp bot Nov 15, 2025

Uh oh!

macroscopeapp bot Nov 15, 2025

Uh oh!

macroscopeapp bot Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		: "${PUSHGATEWAY_URL:=http://localhost:9091}"
		: "${HEALTHCHECK_TIMEOUT:=10}"

Conversation

api-Hypernova commented Nov 15, 2025 • edited by macroscopeapp bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add web health check metrics to Hypernova/webmon by installing curl, copying web_healthcheck.sh to /usr/local/bin/web_healthcheck.sh, and invoking it each loop in newrunner.sh with MONITOR_LOOP_PAUSE control

📍Where to Start

🗂️ Filtered Issues

Uh oh!

Uh oh!

macroscopeapp bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

macroscopeapp bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

macroscopeapp bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

api-Hypernova commented Nov 15, 2025 •

edited by macroscopeapp bot

Loading

Add web health check metrics to Hypernova/webmon by installing `curl`, copying `web_healthcheck.sh` to `/usr/local/bin/web_healthcheck.sh`, and invoking it each loop in `newrunner.sh` with `MONITOR_LOOP_PAUSE` control