Add batched action selection for stateless networks#19
Merged
Conversation
New methods: - DQN_DDQN_choose_action_batch: single forward pass for N observations - DDPG_choose_action_batch: batched continuous actions - TD3_choose_action_batch: batched with noise - Actor.choose_actions_batch: dispatch with epsilon/noise handling Only supports stateless networks (DQN, DDQN, DDPG, TD3). RDDPG explicitly raises NotImplementedError — has LSTM state concerns. 9 tests verify batched output matches sequential for all algorithms. Profiling showed choose_action is 19.8% of Python time (4 sequential calls). Batching reduces this to ~1 call, expected ~60% reduction in that category. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
choose_actions_batch()method that processes multiple observations in a single forward pass. Reduces action selection from N sequential network calls to 1 batched call.Supported: DQN, DDQN, DDPG, TD3 (stateless — no memory between calls)
Not supported: RDDPG (LSTM state concerns — explicitly raises NotImplementedError)
Motivation
Profiling showed
choose_actionaccounts for 19.8% of Python time in RL-CT (called 4× per step, once per robot). Batching reduces this to ~1 call.Why not batch R-GSP-N/A-GSP-N: The LSTM hidden state bug (discards h_t/c_t) means these are accidentally stateless today, but we should not optimize around a bug. When the LSTM is fixed to maintain state, batching would change behavior. See LSTM hidden state bug in TODO.
Tests
9 new tests verify batched output matches sequential for all 4 algorithms:
205/205 tests pass.
Downstream implications
choose_actions_batch([obs_0, obs_1, obs_2, obs_3])instead of 4×choose_agent_action(obs_i)choose_actionindividually — batching is a single-process optimization that won't apply in the fully distributed case, but helps in the base/threaded fidelity layers🤖 Generated with Claude Code