Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ A sample family of reusable [GitHub Agentic Workflows](https://github.github.com
- [👥 Daily Repo Status](docs/daily-repo-status.md) - Assess repository activity and create status reports
- [👥 Daily Team Status](docs/daily-team-status.md) - Create upbeat daily team activity summaries with productivity insights
- [📋 Daily Plan](docs/daily-plan.md) - Update planning issues for team coordination
- [🔍 Discussion Task Miner](docs/discussion-task-miner.md) - Extract actionable improvement tasks from GitHub Discussions and create tracked issues

### Dependency Management Workflows

Expand Down
67 changes: 67 additions & 0 deletions docs/discussion-task-miner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# 🔍 Discussion Task Miner

> For an overview of all available workflows, see the [main README](../README.md).

**Automatically extract actionable tasks from GitHub Discussions and create trackable issues**

The [Discussion Task Miner workflow](../workflows/discussion-task-miner.md?plain=1) runs daily to scan recent GitHub Discussions for actionable improvement opportunities. It identifies concrete, well-scoped tasks and converts them into GitHub issues (up to 5 per run), bridging the gap between discussion insights and tracked work items.

## Installation

```bash
# Install the 'gh aw' extension
gh extension install github/gh-aw

# Add the workflow to your repository
gh aw add-wizard githubnext/agentics/discussion-task-miner
```

This walks you through adding the workflow to your repository.

## How It Works

```mermaid
graph LR
A[Scan Recent Discussions] --> B[Extract Action Items]
B --> C[Filter & Prioritize]
C --> D{High Value?}
D -->|Yes| E[Create GitHub Issue]
D -->|No| F[Skip]
E --> G[Update Memory]
F --> G
```

The workflow reads discussions from the last 7 days, analyzes their content for recommendations, action items, and improvement suggestions, then converts the top findings into focused, actionable GitHub issues. It uses repo-memory to avoid re-processing the same discussions across runs.

## Prerequisites

GitHub Discussions must be enabled for your repository. The workflow works best in repositories that generate discussion content from other agentic workflows (such as analysis reports, quality audits, or review summaries), though it can also mine any human-authored discussions containing improvement suggestions.

## Examples

Based on usage in the gh-aw repository: **57% merge rate** (60 merged PRs out of 105 proposed through a discussion → issue → PR causal chain). The workflow demonstrates how insights buried in discussions can be surfaced as trackable work—a verified example chain: [Discussion #13934](https://github.com/github/gh-aw/discussions/13934) → [Issue #14084](https://github.com/github/gh-aw/issues/14084) → [PR #14129](https://github.com/github/gh-aw/pull/14129).

## Usage

### Configuration

The workflow is configured to:
- Run daily
- Create max 5 issues per run
- Auto-expire issues after 1 day if not addressed
- Use repo-memory to track processed discussions and avoid duplicates

To customize which types of tasks to extract, edit the "Focus Areas" and "Task Extraction Criteria" sections in the workflow file. After editing, run `gh aw compile` to update the workflow and commit all changes to the default branch.

### Pairing with Other Workflows

This workflow pairs especially well with other analysis workflows that post findings as discussions:
- [Daily Accessibility Review](daily-accessibility-review.md)
- [Daily Adhoc QA](daily-qa.md)
- [Daily Malicious Code Scan](daily-malicious-code-scan.md)
- [Daily Performance Improver](daily-perf-improver.md)

## Learn More

- [GitHub Agentic Workflows Documentation](https://github.github.io/gh-aw/)
- [Blog: Agentic Workflow Campaigns & Multi-Phase Workflows](https://github.github.io/gh-aw/blog/2026-01-13-meet-the-workflows-campaigns/)
235 changes: 235 additions & 0 deletions workflows/discussion-task-miner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
---
name: Discussion Task Miner
description: Scans recent GitHub Discussions to extract actionable improvement tasks and create trackable GitHub issues
on:
schedule: daily
workflow_dispatch:

permissions:
contents: read
discussions: read
issues: read
pull-requests: read

tracker-id: discussion-task-miner
timeout-minutes: 20
engine: copilot
strict: true

network:
allowed:
- defaults

safe-outputs:
create-issue:
title-prefix: "[task-miner] "
labels: [automated-analysis]
max: 5
group: true
expires: 1d
messages:
footer: "> 🔍 *Task mining by [{workflow_name}]({run_url})*"
run-started: "🔍 Discussion Task Miner starting! [{workflow_name}]({run_url}) is scanning discussions for actionable tasks..."
run-success: "✅ Task mining complete! [{workflow_name}]({run_url}) has identified actionable tasks from recent discussions. 📊"
run-failure: "⚠️ Task mining interrupted! [{workflow_name}]({run_url}) {status}. Please review the logs..."

tools:
cache-memory: true
github:
lockdown: true
toolsets: [default, discussions]
bash:
- "jq *"
- "cat *"
- "date *"

imports:
- shared/reporting.md
---

# Discussion Task Miner

You are a task mining agent that analyzes recent GitHub Discussions to discover actionable improvement opportunities.

## Mission

Scan recent GitHub Discussions to identify and extract specific, actionable tasks that improve the repository. Convert these discoveries into trackable GitHub issues.

## Objectives

1. **Mine Discussions**: Analyze recent discussions (last 7 days)
2. **Extract Tasks**: Identify concrete, actionable improvements
3. **Create Issues**: Convert high-value tasks into GitHub issues
4. **Track Progress**: Maintain memory of processed discussions to avoid duplicates

## Task Extraction Criteria

Focus on extracting tasks that meet **ALL** these criteria:

### Quality Criteria
- ✅ **Specific**: Task has clear scope and acceptance criteria
- ✅ **Actionable**: Can be completed by a developer or AI agent
- ✅ **Valuable**: Improves the repository in a meaningful way
- ✅ **Scoped**: Can be completed in 1-3 days of work
- ✅ **Independent**: Doesn't require completing other tasks first

### Focus Areas
- **Code Quality**: Simplify complex code, reduce duplication, improve structure
- **Testing**: Add missing tests, improve test coverage, fix flaky tests
- **Documentation**: Add or improve documentation, examples, guides
- **Performance**: Optimize slow operations, reduce resource usage
- **Security**: Fix vulnerabilities, improve security practices
- **Maintainability**: Improve code organization, naming, patterns
- **Technical Debt**: Address TODOs, deprecated APIs, workarounds
- **Tooling**: Improve linters, formatters, build scripts, CI/CD

### Exclude These
- ❌ Vague suggestions without clear scope ("improve code")
- ❌ Already tracked in existing issues
- ❌ Feature requests or new functionality
- ❌ Bug reports (those go through normal bug triage)
- ❌ Tasks requiring architectural decisions
- ❌ Tasks requiring human judgment or business decisions

## Workflow Steps

### Step 1: Load Memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot update memory layout for cache-memory


Check cache-memory for previously processed discussions. The cache memory stores a JSON object with this structure:

```json
{
"last_run": "2026-03-01",
"discussions_processed": [
{"id": 1234, "title": "...", "processed_at": "2026-03-01T10:00:00Z"}
],
"extracted_tasks": [
{
"source_discussion": 1234,
"issue_number": 5678,
"title": "...",
"created_at": "2026-03-01T10:00:00Z",
"status": "created"
}
]
}
```

This helps avoid re-processing the same discussions and creating duplicate issues.

### Step 2: Query Recent Discussions

Use GitHub tools to fetch recent discussions from the last 7 days. Look for discussions with titles or content that contain actionable insights, such as:
- Analysis reports and audit findings
- Code review observations
- Performance or quality assessments
- Recommendations sections in any discussion
- Any discussion mentioning "should", "could", "improve", "fix", "refactor", "add"

Limit to the 20-30 most recent discussions for efficiency.

### Step 3: Analyze Discussion Content

For each discussion, extract the full content including:
- Title and body
- All comments
- Look for sections like:
- "Recommendations"
- "Action Items"
- "Improvements Needed"
- "Issues Found"
- "Technical Debt"
- "Refactoring Opportunities"
- "TODOs" or "Next Steps"

**Analysis approach:**
1. Read the discussion content carefully
2. Identify mentions of concrete improvement opportunities
3. Extract specific tasks with clear descriptions
4. Note the file paths, components, or areas mentioned
5. Assess impact and feasibility

### Step 4: Filter and Prioritize Tasks

From all identified tasks, select the **top 3-5 highest-value tasks** based on:
1. **Impact**: How much does this improve the repository?
2. **Effort**: Is it achievable in 1-3 days?
3. **Clarity**: Is the task well-defined?
4. **Uniqueness**: Haven't we already created an issue for this?

**Deduplication:**
- Check processed-discussions.json to avoid re-extracting from same discussion
- Check extracted-tasks.json to avoid creating duplicate issues
- Search existing GitHub issues to ensure task isn't already tracked

### Step 5: Create GitHub Issues

For each selected task, use the `create-issue` safe output with a clear title and body. Format issues to include:

- **Description**: What needs to be done and why
- **Suggested Changes**: Specific actions to take
- **Files Affected**: Relevant files or components (if known)
- **Success Criteria**: How to know when done
- **Source**: Link to the source discussion
- **Priority**: High/Medium/Low

**Issue formatting guidelines:**
- Use clear, descriptive titles (50-80 characters)
- Include acceptance criteria
- Link back to source discussion
- Add appropriate priority (High/Medium/Low)

### Step 6: Update Memory

Save progress to cache-memory using the JSON structure:

```json
{
"last_run": "<today's date>",
"discussions_processed": [
{"id": 1234, "title": "...", "processed_at": "<timestamp>"}
],
"extracted_tasks": [
{
"source_discussion": 1234,
"issue_number": 5678,
"title": "...",
"created_at": "<timestamp>",
"status": "created"
}
]
}
```

Merge with the existing cache-memory data to preserve historical tracking of processed discussions and extracted tasks.

## Output Requirements

### Issue Creation
- Create **3-5 issues maximum** per run
- Each issue expires after 1 day if not addressed
- All issues tagged with `automated-analysis`
- Issues include clear acceptance criteria

### Memory Tracking
- Always update cache-memory after each run to avoid duplicates
- Maintain extracted tasks in cache-memory for historical tracking

### Quality Standards
- Only create issues for high-value, actionable tasks
- Ensure each issue is specific and well-scoped
- Link back to source discussions for context

## Important Notes

- **Be selective** - only the highest-value tasks make the cut
- **Avoid duplicates** - check memory and existing issues before creating
- **Clear scope** - tasks should be completable in 1-3 days
- **Actionable** - someone should be able to start immediately
- **Source attribution** - always link to the original discussion

**Important**: If no discussions are found or no actionable tasks are identified, you **MUST** call the `noop` safe-output tool with a brief explanation.

```json
{"noop": {"message": "No action needed: [brief explanation of what was analyzed and why no tasks were extracted]"}}
```