Charpup · Charpup · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Task Workflow v3.1 — DAG-Based Task Scheduler
+# Task Workflow v3.2 — DAG-Based Task Scheduler
 
 Intelligent task scheduling skill for AI coding agents. Uses dependency analysis,
 complexity scoring, and topological sorting to produce optimal execution batches.
@@ -13,6 +13,8 @@ Tasks with dependencies → DAG analysis → Batch schedule (parallel where poss
 - **Complexity Scoring**: 1-10 scale, lower complexity tasks execute first
 - **Batch Grouping**: Independent tasks grouped for parallel execution
 - **Dynamic Insertion**: Add tasks mid-execution without restart
+- **Cross-session Persistence**: Daily backlog files with CST 00:00 auto-migration
+- **Cycle Detection**: Refuses to schedule circular dependencies
 
 ## Installation
 
@@ -29,32 +31,63 @@ git clone https://github.com/Charpup/openclaw-task-workflow.git ~/.claude/skills
 Task-workflow integrates with [TriaDev](https://github.com/Charpup/triadev):
 
 ```
-planning-with-files (plan) → task-workflow (schedule) → tdd-sdd (implement)
+planning-with-files (plan) → task-workflow (schedule) → tdd-sdd-development (implement)
 ```
 
-Coordinates via `triadev-handoff.json` — reads extracted tasks, writes batch schedule.
+Coordinates via `triadev-handoff.json` — reads extracted tasks, writes batch schedule. Also works standalone: reads `task_plan.md` directly when no handoff.json is present.
+
+## What's New in v3.2
+
+| Addition | Purpose |
+|----------|---------|
+| `examples/humanizer-skill-schedule/` | **GOLD** real-run reference — 21-task humanizer-skill project with input task_plan, full DAG output, and handoff snippet. Shows fan-out (T7 → 5 tasks), fan-in (T13 ← 5 tasks), critical path (12 tasks, complexity sum 33), max parallelism (5). |
+| `evals/evals.json` (4 → 8 cases) | New cases: within-batch complexity ordering, cross-session migration (CST 00:00 behavior), standalone mode (no triadev-handoff), dynamic insertion mid-execution. ~80% deterministic assertions (sequence_order, json_path_*, file_exists). |
 
 ## Project Structure
 
 ```
 openclaw-task-workflow/
-├── SKILL.md                  # Scheduling instructions
-├── scripts/
-│   ├── task_scheduler.py     # DAG sort + batch grouping
-│   ├── task_persistence.py   # State persistence
-│   └── task_index_manager.py # Cross-session index
-├── references/
-│   ├── file-format.md        # Daily file format spec
-│   └── v3-migration.md       # Migration guide
+├── SKILL.md                                  # Scheduling workflow
+├── README.md                                 # This file
+├── cli.py                                    # Entry point
+├── pytest.ini                                # Test config
 ├── contracts/
-│   └── stack-handshake.json  # TriaDev integration contract
+│   └── stack-handshake.json                  # TriaDev integration contract
+├── references/
+│   ├── file-format.md                        # Daily file format spec (v3)
+│   └── v3-migration.md                       # Migration guide from earlier versions
+├── scripts/
+│   ├── task_scheduler.py                     # DAG sort + batch grouping
+│   ├── task_persistence.py                   # File persistence + daily migration
+│   ├── task_index_manager.py                 # Cross-session index
+│   └── stack_contract.py                     # Contract validator
+├── config/
+│   └── cron.yaml                             # Auto-migration config (CST 00:00)
+├── examples/
+│   └── humanizer-skill-schedule/             # GOLD — real 21-task project
 ├── evals/
-│   └── evals.json            # Test cases
-└── tests/                    # Unit + integration tests
+│   └── evals.json                            # 8 cases
+└── tests/                                    # Unit + integration
 ```
 
+## Working Example
+
+See [`examples/humanizer-skill-schedule/`](examples/humanizer-skill-schedule/) for a real completed run — the humanizer-skill build (PR blader/humanizer#94 merged). Demonstrates:
+
+- Mixed complexities (1 to 7) with real reasoning
+- Fan-out: one high-complexity task (T7 merge 32 patterns) unlocks 5 downstream references
+- Fan-in: the main `SKILL.md` draft (T13) requires all 5 reference files
+- Critical path length 12, total complexity sum 33, max parallelism 5
+- Both raw `output-schedule.json` format and the triadev-handoff slice
+
 ## Changelog
 
+### v3.2.0 (2026-04-18)
+Round-2 standardization. Additive; no breaking changes.
+
+- **New**: `examples/humanizer-skill-schedule/` — GOLD completed-run reference harvested from real humanizer-skill project. 4 files: README + input task_plan.md + output-schedule.json + handoff snippet.
+- **Hardened**: `evals/evals.json` — 4 → 8 cases. New: complexity ordering within batches, cross-session migration behavior, standalone mode (no handoff.json), dynamic insertion. Shifted ~80% of assertions to deterministic types (`sequence_order`, `json_path_equals`, `file_exists`).
+
 ### v3.1.0 (2026-04-09)
 - **New**: Prompt-centric SKILL.md with clear boundary rules
 - **New**: Integration with triadev-handoff.json contract

diff --git a/evals/evals.json b/evals/evals.json
@@ -1,37 +1,65 @@
 {
   "skill_name": "task-workflow",
+  "eval_schema_version": "2.0",
+  "assertion_types_supported": {
+    "llm_judge": "Subjective check, prone to leniency. Use sparingly.",
+    "contains_any": "Deterministic. Response contains at least one value.",
+    "contains_all": "Deterministic. Response contains every value.",
+    "does_not_contain": "Deterministic. Response contains no value from values[].",
+    "file_exists": "Deterministic. File at path exists.",
+    "no_file_exists": "Deterministic. File at path does NOT exist.",
+    "json_path_equals": "Deterministic. JSON at file has path equal to expected.",
+    "json_path_in": "Deterministic. JSON at file has path whose value is one of values[].",
+    "json_path_length": "Deterministic. JSON array at path has expected length.",
+    "json_schema_valid": "Deterministic. JSON at file validates against schema_path.",
+    "sequence_order": "Deterministic. Response contains all values in the given order (for batch ordering checks)."
+  },
   "evals": [
     {
       "id": "dag-basic-01",
       "prompt": "I have 5 tasks for my project: research-api (no deps, complexity 2), design-schema (depends on research-api, complexity 4), implement-auth (depends on design-schema, complexity 6), write-tests (depends on implement-auth, complexity 5), deploy (depends on write-tests and implement-auth, complexity 3). Schedule these with dependency resolution.",
       "expected_output": "Builds a DAG, performs topological sort, groups into batches respecting dependencies and ordered by complexity within each batch.",
       "assertions": [
         {
-          "text": "Correct batch ordering respecting dependencies",
-          "type": "llm_judge",
-          "criteria": "Tasks are scheduled so that no task runs before its dependencies complete. research-api must be in an earlier batch than design-schema, etc."
+          "text": "Batches announced in correct dependency order",
+          "type": "sequence_order",
+          "values": ["research-api", "design-schema", "implement-auth", "write-tests", "deploy"]
         },
         {
           "text": "Mentions scheduling concepts",
           "type": "contains_any",
           "values": ["DAG", "dependency", "batch", "topological", "schedule"]
+        },
+        {
+          "text": "research-api is in first batch",
+          "type": "llm_judge",
+          "criteria": "The schedule output shows research-api in batch 1 (or the first batch group), since it has no dependencies."
         }
       ]
     },
     {
       "id": "handoff-integration-01",
-      "prompt": "Here's my triadev-handoff.json with tasks_extracted already populated:\n{\"version\":\"1.0.0\",\"project\":\"auth-api\",\"route\":\"extended\",\"current_phase\":\"scheduling\",\"planning\":{\"status\":\"complete\",\"tasks_extracted\":[{\"id\":\"create-schema\",\"name\":\"Create DB schema\",\"complexity\":3,\"dependencies\":[]},{\"id\":\"impl-model\",\"name\":\"Implement user model\",\"complexity\":5,\"dependencies\":[\"create-schema\"]},{\"id\":\"write-tests\",\"name\":\"Write integration tests\",\"complexity\":6,\"dependencies\":[\"impl-model\"]}]}}\nSchedule these tasks and update the handoff file.",
+      "prompt": "Here's my triadev-handoff.json with tasks_extracted already populated:\n{\"version\":\"1.0.0\",\"project\":\"auth-api\",\"route\":\"extended\",\"current_phase\":\"scheduling\",\"planning\":{\"status\":\"complete\",\"files\":[\"task_plan.md\"],\"tasks_extracted\":[{\"id\":\"create-schema\",\"name\":\"Create DB schema\",\"complexity\":3,\"dependencies\":[]},{\"id\":\"impl-model\",\"name\":\"Implement user model\",\"complexity\":5,\"dependencies\":[\"create-schema\"]},{\"id\":\"write-tests\",\"name\":\"Write integration tests\",\"complexity\":6,\"dependencies\":[\"impl-model\"]}]},\"scheduling\":{\"status\":\"pending\",\"batches\":[]},\"value_gate\":{\"status\":\"pending\",\"verdict\":null,\"review_path\":null},\"implementation\":{\"status\":\"pending\",\"completed\":[],\"current\":null,\"spec_path\":null,\"tdd_state_path\":null}}\nSchedule these tasks and update the handoff file.",
       "expected_output": "Reads tasks from handoff.json planning.tasks_extracted. Produces batches. Writes scheduling.batches and scheduling.status to handoff file.",
       "assertions": [
         {
-          "text": "Reads from handoff file",
-          "type": "llm_judge",
-          "criteria": "The response reads tasks from the triadev-handoff.json planning.tasks_extracted field, not from some other source."
+          "text": "Updates handoff scheduling status to complete",
+          "type": "json_path_equals",
+          "file": "triadev-handoff.json",
+          "path": "$.scheduling.status",
+          "equals": "complete"
+        },
+        {
+          "text": "Writes exactly 3 batches",
+          "type": "json_path_length",
+          "file": "triadev-handoff.json",
+          "path": "$.scheduling.batches",
+          "length": 3
         },
         {
-          "text": "Writes schedule back to handoff",
+          "text": "First batch contains create-schema",
           "type": "llm_judge",
-          "criteria": "The response updates triadev-handoff.json with scheduling.batches and scheduling.status fields."
+          "criteria": "scheduling.batches[0] contains 'create-schema' (the only dependency-free task)."
         }
       ]
     },
@@ -41,9 +69,19 @@
       "expected_output": "Detects circular dependency (a→c→b→a) and reports the cycle. Does not produce a schedule.",
       "assertions": [
         {
-          "text": "Detects and reports circular dependency",
-          "type": "llm_judge",
-          "criteria": "The response identifies the circular dependency between the three tasks and refuses to produce a schedule, explaining the cycle."
+          "text": "Reports the cycle explicitly",
+          "type": "contains_any",
+          "values": ["circular", "cycle", "cyclic"]
+        },
+        {
+          "text": "Does not produce batches",
+          "type": "does_not_contain",
+          "values": ["Batch 1:", "Batch 2:", "scheduled into batches"]
+        },
+        {
+          "text": "Names the cycling tasks",
+          "type": "contains_all",
+          "values": ["task-a", "task-b", "task-c"]
         }
       ]
     },
@@ -53,9 +91,102 @@
       "expected_output": "This is not a task scheduling request. Should not trigger task-workflow.",
       "assertions": [
         {
-          "text": "Does not trigger DAG scheduling",
+          "text": "Does not produce batches",
+          "type": "does_not_contain",
+          "values": ["Batch 1:", "DAG", "complexity score", "dependency resolution"]
+        },
+        {
+          "text": "No handoff file created",
+          "type": "no_file_exists",
+          "path": "triadev-handoff.json"
+        }
+      ]
+    },
+    {
+      "id": "batch-ordering-complexity-01",
+      "prompt": "Schedule these tasks, all with zero dependencies: task-low (complexity 2), task-mid (complexity 5), task-high (complexity 8), task-trivial (complexity 1). They can all run in parallel, but order within the batch should reflect complexity.",
+      "expected_output": "All four tasks in Batch 1, ordered from lowest to highest complexity: task-trivial → task-low → task-mid → task-high.",
+      "assertions": [
+        {
+          "text": "All tasks in same batch (parallel)",
+          "type": "llm_judge",
+          "criteria": "All four tasks are in Batch 1 (or the single first batch). No dependencies forces them into the same batch."
+        },
+        {
+          "text": "Within-batch order is ascending by complexity",
+          "type": "sequence_order",
+          "values": ["task-trivial", "task-low", "task-mid", "task-high"]
+        }
+      ]
+    },
+    {
+      "id": "cross-session-persistence-01",
+      "prompt": "Yesterday I created a task workflow file with 3 tasks. Two were running at CST 00:00 when auto-migration kicked in:\n\nYesterday's file (~/.openclaw/workspace/task_backlog/task-workflow-progress-2026-04-17.md):\n```\n| ID | Task | Complexity | Dependencies | Status | Batch |\n|----|------|-----------|--------------|--------|-------|\n| a | Task A | 3 | - | ✅ Completed | 1 |\n| b | Task B | 5 | - | 🔄 Running | 1 |\n| c | Task C | 4 | b | ⏳ Pending | 2 |\n```\n\nWhat should today's file (2026-04-18) contain?",
+      "expected_output": "Today's file has task b (running → pending, marked migrated) and task c (pending, marked migrated). Task a (completed) is NOT carried over.",
+      "assertions": [
+        {
+          "text": "Completed task 'a' not migrated",
           "type": "llm_judge",
-          "criteria": "The response handles this as a simple list request without invoking DAG scheduling, batch ordering, or task-workflow concepts."
+          "criteria": "Task 'a' does NOT appear in today's migrated section (completed tasks stay in history)."
+        },
+        {
+          "text": "Running task 'b' migrated with reset status",
+          "type": "contains_all",
+          "values": ["b", "migrated"]
+        },
+        {
+          "text": "Pending task 'c' carried over",
+          "type": "contains_any",
+          "values": ["Task C", "task c"]
+        },
+        {
+          "text": "Status reset logic mentioned",
+          "type": "contains_any",
+          "values": ["reset", "pending", "restart"]
+        }
+      ]
+    },
+    {
+      "id": "standalone-mode-01",
+      "prompt": "I have a task_plan.md but no triadev-handoff.json (I'm using task-workflow standalone). Schedule these:\n\n## Phase 2: Implementation\n- [ ] Setup database (id: setup-db, complexity: 2)\n- [ ] Write API routes (id: write-routes, deps: setup-db, complexity: 5)\n- [ ] Add validation (id: add-validation, deps: write-routes, complexity: 3)\n- [ ] Integration tests (id: int-tests, deps: add-validation, complexity: 6)",
+      "expected_output": "Works in standalone mode. Outputs schedule as formatted text. Does NOT create triadev-handoff.json (not using triadev).",
+      "assertions": [
+        {
+          "text": "Does not create handoff file in standalone mode",
+          "type": "no_file_exists",
+          "path": "triadev-handoff.json"
+        },
+        {
+          "text": "Outputs schedule as text",
+          "type": "contains_all",
+          "values": ["setup-db", "write-routes", "add-validation", "int-tests"]
+        },
+        {
+          "text": "Correct dependency order",
+          "type": "sequence_order",
+          "values": ["setup-db", "write-routes", "add-validation", "int-tests"]
+        }
+      ]
+    },
+    {
+      "id": "dynamic-insertion-01",
+      "prompt": "I currently have this schedule running:\nBatch 1: [a, b] (completed)\nBatch 2: [c] (running)\nBatch 3: [d] (pending, depends on c)\n\nInsert a new task 'hotfix' with complexity 2 and no dependencies. Re-schedule the remaining work.",
+      "expected_output": "Re-runs DAG analysis on remaining tasks only (c, d, hotfix). Since hotfix has no deps and c is already running, hotfix joins the currently-running batch or inserts a new parallel batch.",
+      "assertions": [
+        {
+          "text": "Does not re-schedule completed tasks",
+          "type": "does_not_contain",
+          "values": ["re-scheduling a", "re-scheduling b", "batch 1 updated"]
+        },
+        {
+          "text": "Inserts hotfix into remaining schedule",
+          "type": "contains_any",
+          "values": ["hotfix"]
+        },
+        {
+          "text": "Announces schedule change",
+          "type": "contains_any",
+          "values": ["schedule change", "updated schedule", "inserting", "re-scheduling"]
         }
       ]
     }

diff --git a/examples/humanizer-skill-schedule/README.md b/examples/humanizer-skill-schedule/README.md
@@ -0,0 +1,51 @@
+# Example: humanizer-skill schedule (GOLD — real completed run)
+
+## What this is
+
+A real completed task-workflow run from `projects/humanizer-skill/` (March 2026).
+A 21-task skill build that went planning → scheduling → implementation → PR merged.
+
+Inputs are the **verbatim** `task_plan.md` from the project. Outputs are what
+task-workflow's DAG analysis produces from that plan.
+
+## Why GOLD
+
+- ✅ Real completed project (PR blader/humanizer#94 landed)
+- ✅ 20 of 21 tasks completed — realistic in-flight state
+- ✅ Mix of task complexities (1–7), mix of dependency shapes (fan-out, fan-in, parallel)
+- ✅ Shows both `tasks_extracted` (from handoff.json) and `batches` (scheduled output)
+- ✅ Demonstrates within-batch complexity ordering (lowest complexity first)
+
+## Files
+
+| File | Role |
+|------|------|
+| `input-task_plan.md` | Input to task-workflow (extracted from project) |
+| `output-schedule.json` | task-workflow output (batches + metadata) |
+| `output-handoff-snippet.json` | How this schedule lives inside triadev-handoff.json |
+
+## Characteristic lessons
+
+1. **Complexity ordering within a batch**: Batch 1 contains 4 independent tasks
+   (T1, T2, T6, T16). They're ordered T1(1) → T2(1) → T16(1) → T6(4), ascending
+   by complexity as per scheduler rule.
+
+2. **Fan-out from T7**: T7 (merge 32 patterns, complexity 7) is the single-point
+   dependency for 5 downstream tasks (T8–T12). Those 5 run in parallel after T7.
+
+3. **Fan-in to T13**: T13 (SKILL.md) depends on all 5 reference files (T8–T12).
+   The scheduler correctly waits for the entire batch to complete before running T13.
+
+4. **Linear tail (T17 → T21)**: Deployment phase is inherently sequential. No
+   parallelization possible.
+
+5. **Standalone-style output**: This example uses task-workflow without triadev.
+   Output is formatted text + JSON, not written to triadev-handoff.json. The final
+   file shows how the same schedule maps into handoff.json if triadev orchestrates.
+
+## What would make this NOT an ideal reference
+
+- Task complexity scores were assigned by the author, not derived from code
+- No cross-session daily-file migration shown (project completed in one week,
+  mostly in 2-3 sessions — no CST 00:00 migrations occurred)
+- For daily-file migration example, see `references/file-format.md` (spec only)