Conversation
Adds the per-step and per-episode HDF5 fields needed to detect "GSP
information collapse" — a suspected failure mode where the GSP prediction
network collapses to a near-constant output that carries no information
about the collective state. See Stelaris
docs/specs/2026-04-12-dispatcher-diagnostic-batch.md for the hypothesis.
Changes (all gated on opt-in — backward compatible):
env.py:
- calculate_gsp_reward returns (reward, label, squared_errors). The raw
per-robot (diff - prediction)^2 carries the magnitude that the clipped
[-2, 0] reward hides.
rl_code/src/hdf5_logger.py:
- New optional kwargs gsp_target, gsp_squared_error on writerow → 2D
(timesteps × robots) datasets.
- New record_gsp_loss(value) method → 1D dataset at GSP learning cadence.
- write_episode now computes two episode-level summary attrs when both
prediction and target buffers are present:
- gsp_output_std (collapse signature: → 0)
- gsp_pred_target_corr (collapse signature: → NaN when std is below
1e-12 tolerance, distinguishing "undefined" from "measured zero")
Uses np.nanstd and pair-wise NaN masking so a single physics glitch
doesn't poison the summary; raises ValueError if gsp_target/gsp_heading
buffers desync within an episode.
Main.py:
- 3-tuple unpack of calculate_gsp_reward; broadcast scalar label to
per-robot list for the (timesteps × robots) HDF5 schema; pass new
kwargs to hdf5_writer.writerow.
- After each model.learn() call, capture model.last_gsp_loss (from
GSP-RL PR #23) and pass to hdf5_writer.record_gsp_loss. In
--independent_learning mode, aggregate across per-robot models to a
single scalar per learn tick (mean) so the gsp_loss axis length stays
num_learn_steps regardless of mode.
Tests:
- 6 new TestGSPSquaredErrorReturn cases in test_env/test_gsp_reward.py;
existing tests updated to 3-tuple unpack.
- tests/test_diagnostics/test_hdf5_logger_gsp_diagnostics.py — 9 new
tests covering: per-step datasets, gsp_loss recording, episode attrs,
collapse signature detection, degenerate task, NaN poisoning,
desynced-buffer raise, backward compat, optional record_gsp_loss.
Companion: NESTLab/GSP-RL#23 (Actor.last_gsp_loss).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port-faithfulness reviewVerified against the PR #11 approved review fixes. The single commit (f94c4ed) touches exactly the 5 expected files (Main.py, env.py, hdf5_logger.py, test_gsp_reward.py, test_hdf5_logger_gsp_diagnostics.py) with no drift. hdf5_logger.py — all six invariants preserved:
Main.py — ported cleanly onto master's API:
env.py — 3-tuple return with zero-fill in the GSP-disabled branch. Raw unclipped Tests — existing New issues introduced by the port: none. Verdict: ready to merge. |
Summary
Adds the per-step and per-episode HDF5 fields needed to detect "GSP information collapse" — a suspected failure mode where the GSP prediction network collapses to a near-constant output. See Stelaris `docs/specs/2026-04-12-dispatcher-diagnostic-batch.md` for the hypothesis.
This is a clean port of the work originally done on `feat/learn-every-n-steps` (PR #11) onto current master. It uses master's local `rl_code/src/hdf5_logger.py` and `hdf5_writer` variable name, and incorporates the cardinality fix from PR #11's review (aggregate per-tick mean in `--independent_learning` mode).
Changes
`rl_code/src/env.py`
`rl_code/src/hdf5_logger.py`
`rl_code/Main.py`
Tests
Backward compatibility
All new kwargs/methods are optional. Existing callers continue to work. Existing `test_hdf5_logger.py` (7 tests) unchanged and still passing.
Test plan
Companion
`NESTLab/GSP-RL#23` (already merged to main) — provides `Actor.last_gsp_loss` that this PR reads.
🤖 Generated with Claude Code