Skip to content

Kbediako/prime-intellect-rl-tower-defense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tower Defence

Prime Hub Source Code

Overview

  • Environment ID: kbediako/tower_defence
  • Python package: prime_td_env
  • Short description: Multi-turn macro-round tower defense environment for hosted RL.
  • Tags: games, tower-defense, rl, verifiers, prime-rl, multi-turn
  • Published package shape: env.py, pyproject.toml, README.md, src/

Datasets

  • Primary dataset(s): Procedural tower-defense seeds generated by the environment.
  • Snapshot mode: Configurable round snapshots (dataset.snapshots) for curriculum and stability studies.
  • Split control: Training/eval sample volume is controlled by environment args and run config.

Task

  • Type: Multi-turn game interaction (macro-round planning).
  • Interaction contract: One assistant plan maps to one in-game round progression.
  • Action surface: Candidate-index planning with {"type":"plan","actions":[{"type":"choose","index":N}, ...]}.

Quickstart

Run local smoke checks:

PYTHONPATH=src python3 scripts/smoke.py

Run baseline local evaluation:

PYTHONPATH=src python3 scripts/eval_baseline.py --episodes 10 --max-rounds 20 --output out/metrics.json

Run hosted training:

prime rl run configs/lab/prime-td.toml

Environment Arguments

Group Key args Purpose
wrapper wrapper="macro_round" Enables multi-turn round-by-round planning mode.
difficulty max_rounds Controls episode horizon/curriculum cap.
observation max_action_candidates, max_build_slots, max_towers, max_threats Bounds payload size and candidate space.
candidate_balance min_build_frac, max_upgrade_candidates, by_phase.* Tunes build/upgrade exposure by phase.
dataset policy, rollout_steps, snapshots, safe_explore_* Controls training observation generation.
rules auto_advance_round, prep_actions_*, mask_sell Governs turn semantics and allowed behavior.

Metrics

Metric Meaning
reward/mean Aggregate training reward over samples in a step.
metrics/num_turns Episode length in environment turns.
format_reward Validity of structured plan/action output.
macro_round_delta (derived) Round advance per turn; expected delta_round == 1 after turn 1.
Action mix (derived) Build vs upgrade distribution, tracked by phase.

Run Hygiene

  • Record run IDs and config filenames immediately after launch.
  • Pull rollouts for every sampled step and parse all user/assistant turns.
  • Validate macro-round invariant delta_round == 1 for user observations after turn 1.
  • If sample upload 500s appear, reduce tokens/observation caps or horizon pressure.

Detailed longitudinal results and run-by-run analysis: docs/RESULTS.md.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors