diff --git a/README.md b/README.md index a42347b..196b16f 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ A sample family of reusable [GitHub Agentic Workflows](https://github.github.com - [πŸ‘₯ Daily Repo Status](docs/daily-repo-status.md) - Assess repository activity and create status reports - [πŸ‘₯ Daily Team Status](docs/daily-team-status.md) - Create upbeat daily team activity summaries with productivity insights - [πŸ“‹ Daily Plan](docs/daily-plan.md) - Update planning issues for team coordination +- [πŸ” Discussion Task Miner](docs/discussion-task-miner.md) - Extract actionable improvement tasks from GitHub Discussions and create tracked issues ### Dependency Management Workflows diff --git a/docs/discussion-task-miner.md b/docs/discussion-task-miner.md new file mode 100644 index 0000000..20ca851 --- /dev/null +++ b/docs/discussion-task-miner.md @@ -0,0 +1,67 @@ +# πŸ” Discussion Task Miner + +> For an overview of all available workflows, see the [main README](../README.md). + +**Automatically extract actionable tasks from GitHub Discussions and create trackable issues** + +The [Discussion Task Miner workflow](../workflows/discussion-task-miner.md?plain=1) runs daily to scan recent GitHub Discussions for actionable improvement opportunities. It identifies concrete, well-scoped tasks and converts them into GitHub issues (up to 5 per run), bridging the gap between discussion insights and tracked work items. + +## Installation + +```bash +# Install the 'gh aw' extension +gh extension install github/gh-aw + +# Add the workflow to your repository +gh aw add-wizard githubnext/agentics/discussion-task-miner +``` + +This walks you through adding the workflow to your repository. + +## How It Works + +```mermaid +graph LR + A[Scan Recent Discussions] --> B[Extract Action Items] + B --> C[Filter & Prioritize] + C --> D{High Value?} + D -->|Yes| E[Create GitHub Issue] + D -->|No| F[Skip] + E --> G[Update Memory] + F --> G +``` + +The workflow reads discussions from the last 7 days, analyzes their content for recommendations, action items, and improvement suggestions, then converts the top findings into focused, actionable GitHub issues. It uses repo-memory to avoid re-processing the same discussions across runs. + +## Prerequisites + +GitHub Discussions must be enabled for your repository. The workflow works best in repositories that generate discussion content from other agentic workflows (such as analysis reports, quality audits, or review summaries), though it can also mine any human-authored discussions containing improvement suggestions. + +## Examples + +Based on usage in the gh-aw repository: **57% merge rate** (60 merged PRs out of 105 proposed through a discussion β†’ issue β†’ PR causal chain). The workflow demonstrates how insights buried in discussions can be surfaced as trackable workβ€”a verified example chain: [Discussion #13934](https://github.com/github/gh-aw/discussions/13934) β†’ [Issue #14084](https://github.com/github/gh-aw/issues/14084) β†’ [PR #14129](https://github.com/github/gh-aw/pull/14129). + +## Usage + +### Configuration + +The workflow is configured to: +- Run daily +- Create max 5 issues per run +- Auto-expire issues after 1 day if not addressed +- Use repo-memory to track processed discussions and avoid duplicates + +To customize which types of tasks to extract, edit the "Focus Areas" and "Task Extraction Criteria" sections in the workflow file. After editing, run `gh aw compile` to update the workflow and commit all changes to the default branch. + +### Pairing with Other Workflows + +This workflow pairs especially well with other analysis workflows that post findings as discussions: +- [Daily Accessibility Review](daily-accessibility-review.md) +- [Daily Adhoc QA](daily-qa.md) +- [Daily Malicious Code Scan](daily-malicious-code-scan.md) +- [Daily Performance Improver](daily-perf-improver.md) + +## Learn More + +- [GitHub Agentic Workflows Documentation](https://github.github.io/gh-aw/) +- [Blog: Agentic Workflow Campaigns & Multi-Phase Workflows](https://github.github.io/gh-aw/blog/2026-01-13-meet-the-workflows-campaigns/) diff --git a/workflows/discussion-task-miner.md b/workflows/discussion-task-miner.md new file mode 100644 index 0000000..7aef212 --- /dev/null +++ b/workflows/discussion-task-miner.md @@ -0,0 +1,235 @@ +--- +name: Discussion Task Miner +description: Scans recent GitHub Discussions to extract actionable improvement tasks and create trackable GitHub issues +on: + schedule: daily + workflow_dispatch: + +permissions: + contents: read + discussions: read + issues: read + pull-requests: read + +tracker-id: discussion-task-miner +timeout-minutes: 20 +engine: copilot +strict: true + +network: + allowed: + - defaults + +safe-outputs: + create-issue: + title-prefix: "[task-miner] " + labels: [automated-analysis] + max: 5 + group: true + expires: 1d + messages: + footer: "> πŸ” *Task mining by [{workflow_name}]({run_url})*" + run-started: "πŸ” Discussion Task Miner starting! [{workflow_name}]({run_url}) is scanning discussions for actionable tasks..." + run-success: "βœ… Task mining complete! [{workflow_name}]({run_url}) has identified actionable tasks from recent discussions. πŸ“Š" + run-failure: "⚠️ Task mining interrupted! [{workflow_name}]({run_url}) {status}. Please review the logs..." + +tools: + cache-memory: true + github: + lockdown: true + toolsets: [default, discussions] + bash: + - "jq *" + - "cat *" + - "date *" + +imports: + - shared/reporting.md +--- + +# Discussion Task Miner + +You are a task mining agent that analyzes recent GitHub Discussions to discover actionable improvement opportunities. + +## Mission + +Scan recent GitHub Discussions to identify and extract specific, actionable tasks that improve the repository. Convert these discoveries into trackable GitHub issues. + +## Objectives + +1. **Mine Discussions**: Analyze recent discussions (last 7 days) +2. **Extract Tasks**: Identify concrete, actionable improvements +3. **Create Issues**: Convert high-value tasks into GitHub issues +4. **Track Progress**: Maintain memory of processed discussions to avoid duplicates + +## Task Extraction Criteria + +Focus on extracting tasks that meet **ALL** these criteria: + +### Quality Criteria +- βœ… **Specific**: Task has clear scope and acceptance criteria +- βœ… **Actionable**: Can be completed by a developer or AI agent +- βœ… **Valuable**: Improves the repository in a meaningful way +- βœ… **Scoped**: Can be completed in 1-3 days of work +- βœ… **Independent**: Doesn't require completing other tasks first + +### Focus Areas +- **Code Quality**: Simplify complex code, reduce duplication, improve structure +- **Testing**: Add missing tests, improve test coverage, fix flaky tests +- **Documentation**: Add or improve documentation, examples, guides +- **Performance**: Optimize slow operations, reduce resource usage +- **Security**: Fix vulnerabilities, improve security practices +- **Maintainability**: Improve code organization, naming, patterns +- **Technical Debt**: Address TODOs, deprecated APIs, workarounds +- **Tooling**: Improve linters, formatters, build scripts, CI/CD + +### Exclude These +- ❌ Vague suggestions without clear scope ("improve code") +- ❌ Already tracked in existing issues +- ❌ Feature requests or new functionality +- ❌ Bug reports (those go through normal bug triage) +- ❌ Tasks requiring architectural decisions +- ❌ Tasks requiring human judgment or business decisions + +## Workflow Steps + +### Step 1: Load Memory + +Check cache-memory for previously processed discussions. The cache memory stores a JSON object with this structure: + +```json +{ + "last_run": "2026-03-01", + "discussions_processed": [ + {"id": 1234, "title": "...", "processed_at": "2026-03-01T10:00:00Z"} + ], + "extracted_tasks": [ + { + "source_discussion": 1234, + "issue_number": 5678, + "title": "...", + "created_at": "2026-03-01T10:00:00Z", + "status": "created" + } + ] +} +``` + +This helps avoid re-processing the same discussions and creating duplicate issues. + +### Step 2: Query Recent Discussions + +Use GitHub tools to fetch recent discussions from the last 7 days. Look for discussions with titles or content that contain actionable insights, such as: +- Analysis reports and audit findings +- Code review observations +- Performance or quality assessments +- Recommendations sections in any discussion +- Any discussion mentioning "should", "could", "improve", "fix", "refactor", "add" + +Limit to the 20-30 most recent discussions for efficiency. + +### Step 3: Analyze Discussion Content + +For each discussion, extract the full content including: +- Title and body +- All comments +- Look for sections like: + - "Recommendations" + - "Action Items" + - "Improvements Needed" + - "Issues Found" + - "Technical Debt" + - "Refactoring Opportunities" + - "TODOs" or "Next Steps" + +**Analysis approach:** +1. Read the discussion content carefully +2. Identify mentions of concrete improvement opportunities +3. Extract specific tasks with clear descriptions +4. Note the file paths, components, or areas mentioned +5. Assess impact and feasibility + +### Step 4: Filter and Prioritize Tasks + +From all identified tasks, select the **top 3-5 highest-value tasks** based on: +1. **Impact**: How much does this improve the repository? +2. **Effort**: Is it achievable in 1-3 days? +3. **Clarity**: Is the task well-defined? +4. **Uniqueness**: Haven't we already created an issue for this? + +**Deduplication:** +- Check processed-discussions.json to avoid re-extracting from same discussion +- Check extracted-tasks.json to avoid creating duplicate issues +- Search existing GitHub issues to ensure task isn't already tracked + +### Step 5: Create GitHub Issues + +For each selected task, use the `create-issue` safe output with a clear title and body. Format issues to include: + +- **Description**: What needs to be done and why +- **Suggested Changes**: Specific actions to take +- **Files Affected**: Relevant files or components (if known) +- **Success Criteria**: How to know when done +- **Source**: Link to the source discussion +- **Priority**: High/Medium/Low + +**Issue formatting guidelines:** +- Use clear, descriptive titles (50-80 characters) +- Include acceptance criteria +- Link back to source discussion +- Add appropriate priority (High/Medium/Low) + +### Step 6: Update Memory + +Save progress to cache-memory using the JSON structure: + +```json +{ + "last_run": "", + "discussions_processed": [ + {"id": 1234, "title": "...", "processed_at": ""} + ], + "extracted_tasks": [ + { + "source_discussion": 1234, + "issue_number": 5678, + "title": "...", + "created_at": "", + "status": "created" + } + ] +} +``` + +Merge with the existing cache-memory data to preserve historical tracking of processed discussions and extracted tasks. + +## Output Requirements + +### Issue Creation +- Create **3-5 issues maximum** per run +- Each issue expires after 1 day if not addressed +- All issues tagged with `automated-analysis` +- Issues include clear acceptance criteria + +### Memory Tracking +- Always update cache-memory after each run to avoid duplicates +- Maintain extracted tasks in cache-memory for historical tracking + +### Quality Standards +- Only create issues for high-value, actionable tasks +- Ensure each issue is specific and well-scoped +- Link back to source discussions for context + +## Important Notes + +- **Be selective** - only the highest-value tasks make the cut +- **Avoid duplicates** - check memory and existing issues before creating +- **Clear scope** - tasks should be completable in 1-3 days +- **Actionable** - someone should be able to start immediately +- **Source attribution** - always link to the original discussion + +**Important**: If no discussions are found or no actionable tasks are identified, you **MUST** call the `noop` safe-output tool with a brief explanation. + +```json +{"noop": {"message": "No action needed: [brief explanation of what was analyzed and why no tasks were extracted]"}} +```