feat(gsp-diagnostics): add information-collapse logging for GSP variants by jdbloom · Pull Request #12 · NESTLab/RL-CollectiveTransport

jdbloom · 2026-04-13T10:37:25Z

Summary

Adds the per-step and per-episode HDF5 fields needed to detect "GSP information collapse" — a suspected failure mode where the GSP prediction network collapses to a near-constant output. See Stelaris `docs/specs/2026-04-12-dispatcher-diagnostic-batch.md` for the hypothesis.

This is a clean port of the work originally done on `feat/learn-every-n-steps` (PR #11) onto current master. It uses master's local `rl_code/src/hdf5_logger.py` and `hdf5_writer` variable name, and incorporates the cardinality fix from PR #11's review (aggregate per-tick mean in `--independent_learning` mode).

Changes

`rl_code/src/env.py`

`calculate_gsp_reward` now returns `(reward, label, squared_errors)`. The raw per-robot `(diff - prediction)²` carries the magnitude that the clipped `[-2, 0]` reward hides.

`rl_code/src/hdf5_logger.py`

New optional kwargs `gsp_target`, `gsp_squared_error` on `writerow` → 2D `(timesteps × robots)` datasets.
New `record_gsp_loss(value)` method → 1D dataset at GSP learning cadence.
`write_episode` computes two episode-level summary attrs when both prediction and target are present:
- `gsp_output_std` — collapse signature: → 0
- `gsp_pred_target_corr` — NaN when std is below 1e-12 tolerance, distinguishing "undefined" from "measured zero"
Uses `np.nanstd` and pair-wise NaN masking so a single physics glitch doesn't poison the summary.
Raises `ValueError` if `gsp_target`/`gsp_heading` buffers desync within an episode.

`rl_code/Main.py`

3-tuple unpack of `calculate_gsp_reward`; broadcast scalar `label` to per-robot list for the HDF5 schema.
Pass new kwargs to `hdf5_writer.writerow`.
After each `model.learn()` call, capture `model.last_gsp_loss` (from companion GSP-RL PR #23, already merged to main) and forward to `hdf5_writer.record_gsp_loss`.
In `--independent_learning` mode: aggregate per-robot losses to a single scalar per learn tick (mean) so the `gsp_loss` axis length stays `num_learn_steps` regardless of mode.

Tests

6 new `TestGSPSquaredErrorReturn` cases in `test_env/test_gsp_reward.py`; existing tests updated to 3-tuple unpack.
`tests/test_diagnostics/test_hdf5_logger_gsp_diagnostics.py` — 9 new tests: per-step datasets, gsp_loss recording, episode attrs, collapse signature detection, degenerate task, NaN poisoning, desynced-buffer raise, backward compat, optional record_gsp_loss.

Backward compatibility

All new kwargs/methods are optional. Existing callers continue to work. Existing `test_hdf5_logger.py` (7 tests) unchanged and still passing.

Test plan

30 targeted tests pass
Full RL-CT suite: 111/111 pass on top of current master (excluding pre-existing `test_nan_guards.py` import error unrelated to this PR — it imports `_check_nan` from a stale GSP-RL submodule path)
All edits syntax-checked

Companion

`NESTLab/GSP-RL#23` (already merged to main) — provides `Actor.last_gsp_loss` that this PR reads.

🤖 Generated with Claude Code

Adds the per-step and per-episode HDF5 fields needed to detect "GSP information collapse" — a suspected failure mode where the GSP prediction network collapses to a near-constant output that carries no information about the collective state. See Stelaris docs/specs/2026-04-12-dispatcher-diagnostic-batch.md for the hypothesis. Changes (all gated on opt-in — backward compatible): env.py: - calculate_gsp_reward returns (reward, label, squared_errors). The raw per-robot (diff - prediction)^2 carries the magnitude that the clipped [-2, 0] reward hides. rl_code/src/hdf5_logger.py: - New optional kwargs gsp_target, gsp_squared_error on writerow → 2D (timesteps × robots) datasets. - New record_gsp_loss(value) method → 1D dataset at GSP learning cadence. - write_episode now computes two episode-level summary attrs when both prediction and target buffers are present: - gsp_output_std (collapse signature: → 0) - gsp_pred_target_corr (collapse signature: → NaN when std is below 1e-12 tolerance, distinguishing "undefined" from "measured zero") Uses np.nanstd and pair-wise NaN masking so a single physics glitch doesn't poison the summary; raises ValueError if gsp_target/gsp_heading buffers desync within an episode. Main.py: - 3-tuple unpack of calculate_gsp_reward; broadcast scalar label to per-robot list for the (timesteps × robots) HDF5 schema; pass new kwargs to hdf5_writer.writerow. - After each model.learn() call, capture model.last_gsp_loss (from GSP-RL PR #23) and pass to hdf5_writer.record_gsp_loss. In --independent_learning mode, aggregate across per-robot models to a single scalar per learn tick (mean) so the gsp_loss axis length stays num_learn_steps regardless of mode. Tests: - 6 new TestGSPSquaredErrorReturn cases in test_env/test_gsp_reward.py; existing tests updated to 3-tuple unpack. - tests/test_diagnostics/test_hdf5_logger_gsp_diagnostics.py — 9 new tests covering: per-step datasets, gsp_loss recording, episode attrs, collapse signature detection, degenerate task, NaN poisoning, desynced-buffer raise, backward compat, optional record_gsp_loss. Companion: NESTLab/GSP-RL#23 (Actor.last_gsp_loss). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jdbloom · 2026-04-13T10:38:53Z

Port-faithfulness review

Verified against the PR #11 approved review fixes. The single commit (f94c4ed) touches exactly the 5 expected files (Main.py, env.py, hdf5_logger.py, test_gsp_reward.py, test_hdf5_logger_gsp_diagnostics.py) with no drift.

hdf5_logger.py — all six invariants preserved:

_reset() clears gsp_target, gsp_squared_error, gsp_loss.
writerow conditionally appends (non-None guard).
record_gsp_loss(value) present, casts to float.
write_episode conditionally creates gsp_target / gsp_squared_error 2D datasets and a separate 1D gsp_loss dataset (outside the timestep-indexed block, correct given differing cadence).
Desync check raises ValueError with a descriptive message before any aggregation.
Summary attrs use np.nanstd, STD_TOL = 1e-12, pair-wise np.isfinite mask, and corr = float("nan") when either std is below tolerance or fewer than 2 finite pairs remain. Exactly the approved semantics.

Main.py — ported cleanly onto master's API:

3-tuple unpack at the calculate_gsp_reward call.
hdf5_writer.writerow(...) (correct master-side variable name) with new gsp_target=gsp_target_per_robot, gsp_squared_error=gsp_squared_error kwargs.
Learn-step cardinality fix is correct: shared-model branch records once per tick via getattr(model, "last_gsp_loss", None); independent-learning branch runs all per-robot learn() calls first, then collects non-None last_gsp_loss values and records a single np.mean per tick. Matches the "one entry per tick" contract from PR feat(gsp-diagnostics): surface label, squared error, and GSP loss per step #11 review.
gsp_target_per_robot = [float(label)] * Utility.params['num_robots'] — valid broadcast.
numpy as np already imported at line 14, so np.mean is fine.

env.py — 3-tuple return with zero-fill in the GSP-disabled branch. Raw unclipped abs(reward)**2 preserved.

Tests — existing test_hdf5_logger.py untouched (confirmed via git log HEAD~1..HEAD --). test_gsp_reward.py updates are purely mechanical 3-tuple unpacks plus the new TestGSPSquaredErrorReturn class; no existing assertion altered. New test_hdf5_logger_gsp_diagnostics.py imports from src.hdf5_logger and uses the same HAS_H5PY gating pattern as the reference file.

New issues introduced by the port: none.

Verdict: ready to merge.

jdbloom merged commit 2adc394 into master Apr 13, 2026
3 checks passed

jdbloom deleted the feature/gsp-diagnostics-onto-master branch April 13, 2026 10:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gsp-diagnostics): add information-collapse logging for GSP variants#12

feat(gsp-diagnostics): add information-collapse logging for GSP variants#12
jdbloom merged 1 commit intomasterfrom
feature/gsp-diagnostics-onto-master

jdbloom commented Apr 13, 2026

Uh oh!

jdbloom commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdbloom commented Apr 13, 2026

Summary

Changes

Backward compatibility

Test plan

Companion

Uh oh!

jdbloom commented Apr 13, 2026

Port-faithfulness review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant