Skip to content

feat(ci): add duplicate issue detection and auto-close bot#22034

Merged
jquinter merged 4 commits intomainfrom
feat/duplicate-issues-bot
Feb 28, 2026
Merged

feat(ci): add duplicate issue detection and auto-close bot#22034
jquinter merged 4 commits intomainfrom
feat/duplicate-issues-bot

Conversation

@jquinter
Copy link
Collaborator

Summary

  • Adds a Python script (.github/scripts/close_duplicate_issues.py) that detects duplicate issues using difflib.SequenceMatcher on normalized titles and closes them via the gh CLI
  • Adds a workflow_dispatch workflow (scan_duplicate_issues.yml) for one-time batch scans of all open issues
  • Extends the existing check_duplicate_issues.yml workflow to auto-close newly opened issues that are high-confidence duplicates

Two-tier duplicate detection

Threshold Action
0.60 Informational comment + potential-duplicate label (existing wow-actions step)
0.85 Auto-close with comment, duplicate label, and not_planned state reason

Dry-run results against 830 open issues

  • 0.70 threshold: 13 matches, 6 false positives (46%) — short titles like "support timezone setting" vs "support salt rotating" matched incorrectly
  • 0.85 threshold: 7 matches, 0 false positives — all legitimate (empty junk titles at 100% or identical bug reports at 87%)

Test plan

  • Ran dry-run scan locally: python3 .github/scripts/close_duplicate_issues.py --repo BerriAI/litellm --scan --threshold 0.85
  • Trigger one-time scan workflow via gh workflow run scan_duplicate_issues.yml with close: false to verify in CI
  • Re-run with close: true to close the 7 confirmed duplicates
  • Test ongoing detection by creating a test issue with a known duplicate title

🤖 Generated with Claude Code

Add a Python script that detects duplicate issues using title similarity
(difflib.SequenceMatcher) and closes them via the gh CLI. Two-tier system:
- 0.6 threshold: informational comment via existing wow-actions step
- 0.85 threshold: auto-close with comment, label, and not_planned reason

Includes a workflow_dispatch workflow for one-time batch scans and
integrates auto-close into the existing check_duplicate_issues workflow
for newly opened issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 28, 2026 3:19am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 24, 2026

Greptile Summary

Adds a two-tier duplicate issue detection system: the existing wow-actions/potential-duplicates step labels issues at 0.60 similarity, while a new Python script auto-closes issues at 0.85 similarity via the gh CLI. A new workflow_dispatch workflow enables one-time batch scans of all open issues.

  • The core duplicate detection logic using difflib.SequenceMatcher at 0.85 threshold is well-tested (0 false positives against 830 issues) and includes dry-run safety by default
  • check_duplicate_issues.yml is missing an explicit actions/setup-python step — the script uses Python 3.10+ syntax (str | None) and relies on the system Python, unlike the scan workflow which pins Python 3.11
  • fetch_open_issues contains dead code (lines 44-47) that builds API parameters which are immediately overwritten
  • For every newly opened issue, the workflow fetches all open issues (800+) to check for duplicates — this works but is worth noting for scalability

Confidence Score: 4/5

  • This PR is safe to merge with minor improvements recommended — CI-only changes with no impact on application code.
  • The PR adds CI tooling only (no changes to application code). The core detection logic is sound with a well-tested 0.85 threshold. The main concern is the missing actions/setup-python step in check_duplicate_issues.yml, which could cause failures if the runner's system Python drops below 3.10. The dead code and injection hardening suggestions are minor quality improvements.
  • check_duplicate_issues.yml needs attention for the missing Python setup step. .github/scripts/close_duplicate_issues.py has dead code that should be cleaned up.

Important Files Changed

Filename Overview
.github/scripts/close_duplicate_issues.py New script for duplicate issue detection using difflib.SequenceMatcher. Contains dead code in fetch_open_issues (lines 44-47 overwritten by 48-49). Core logic is sound — threshold-based matching with dry-run safety. Minor issues only.
.github/workflows/check_duplicate_issues.yml Extends existing workflow with auto-close steps gated on opened events. Missing actions/setup-python step — relies on system Python for PEP 604 syntax (`str
.github/workflows/scan_duplicate_issues.yml New workflow_dispatch workflow for batch scanning. Properly sets up Python 3.11. Minor concern: ${{ inputs.threshold }} directly interpolated in shell (low risk since write-access gated).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Issue Opened] --> B[wow-actions/potential-duplicates\nthreshold: 0.60]
    B --> C{Similarity >= 0.60?}
    C -->|Yes| D[Add 'potential-duplicate' label\n+ informational comment]
    C -->|No| E[No action]
    D --> F[Checkout close script]
    E --> F
    F --> G[Run close_duplicate_issues.py\nthreshold: 0.85]
    G --> H[Fetch ALL open issues via gh api]
    H --> I[Normalize titles & compare\nusing SequenceMatcher]
    I --> J{Similarity >= 0.85?}
    J -->|Yes| K[Add comment + 'duplicate' label\nClose with 'not_planned' reason]
    J -->|No| L[No further action]

    M[Manual Trigger\nworkflow_dispatch] --> N[Fetch ALL open issues]
    N --> O[Compare every issue\nagainst older issues]
    O --> P{Similarity >= threshold?}
    P -->|Yes & close=true| Q[Close as duplicate]
    P -->|Yes & close=false| R[Dry-run log only]
    P -->|No| S[Skip]
Loading

Last reviewed commit: db3d61f

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@shin-bot-litellm
Copy link
Collaborator

Review

1. Does this PR fix the issue it describes?
Yes. Adds a duplicate issue detection system with two tiers:

  • 0.60 threshold → informational comment + label
  • 0.85 threshold → auto-close (verified 0 false positives on 830 issues)

Uses difflib.SequenceMatcher on normalized titles. Well-documented dry-run results.

2. Has this issue already been solved elsewhere?
Partially — check_duplicate_issues.yml exists using wow-actions but only adds labels at 0.60. This extends it with auto-close at 0.85.

3. Are there other PRs addressing the same problem?
No duplicates found.

4. Are there other issues this potentially closes?
Will help reduce issue backlog noise — 7 confirmed duplicates identified in dry run.

✅ LGTM — conservative threshold (0.85) ensures no false positives. Good CI addition.

jquinter and others added 3 commits February 28, 2026 00:17
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@jquinter jquinter merged commit df99be9 into main Feb 28, 2026
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants