Skip to content

Add reviewer recommendation engine#11

Open
pwerry wants to merge 5 commits intomainfrom
feature/suggest-reviewers
Open

Add reviewer recommendation engine#11
pwerry wants to merge 5 commits intomainfrom
feature/suggest-reviewers

Conversation

@pwerry
Copy link
Copy Markdown
Contributor

@pwerry pwerry commented Apr 5, 2026

Summary

Closes #5. Adds a suggest-reviewers command that recommends the best PR reviewers for a given set of changed files.

The algorithm combines three signals:

  • Code expertise (60%) — who knows this code best, scored from git log history
  • Social proximity (30%) — how closely do they collaborate with the PR author (from the social graph)
  • Review load balance (10%) — prefer reviewers who aren't already overloaded

Usage

# Suggest reviewers for specific files
social-graph suggest-reviewers src/social_graph/scoring.py

# Multiple paths
social-graph suggest-reviewers src/social_graph/graph/ src/social_graph/models.py

# Explicit author (auto-detected from git config by default)
social-graph suggest-reviewers --author pwerry src/social_graph/cli.py

Example output

┌───┬──────────┬───────┬───────────┬───────────┬──────────────────────────────┐
│ # │ Reviewer │ Score │ Expertise │ Proximity │ Why                          │
├───┼──────────┼───────┼───────────┼───────────┼──────────────────────────────┤
│ 1 │ alice    │ 0.832 │      0.95 │      0.80 │ domain expert, collaborator  │
│ 2 │ bob      │ 0.614 │      0.72 │      0.45 │ knows the code               │
│ 3 │ charlie  │ 0.401 │      0.30 │      0.60 │ close collaborator           │
└───┴──────────┴───────┴───────────┴───────────┴──────────────────────────────┘

Changes

  • New: src/social_graph/graph/reviewers.py — recommendation algorithm
  • Modified: src/social_graph/cli.pysuggest-reviewers CLI command
  • New: tests/test_reviewers.py — 8 tests covering ranking, author exclusion, expertise/proximity weighting, edge cases

Test plan

  • 57 tests passing (8 new + 49 existing)
  • ruff check clean
  • Manual test: social-graph suggest-reviewers --repo-path ~/Developer/unblocked src/

New `suggest-reviewers` command that ranks the best reviewers for changed
files using a weighted combination of:
- Code expertise (60%) — who knows this code best, from git history
- Social proximity (30%) — who already collaborates with the PR author
- Review load balance (10%) — prefer less-loaded reviewers

Usage:
  social-graph suggest-reviewers src/social_graph/scoring.py
  social-graph suggest-reviewers --author pwerry src/graph/ src/models.py

Auto-detects the PR author from git config. Requires a prior `build`.
Copy link
Copy Markdown

@unblocked unblocked Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been reviewed by Unblocked Code Review

Unblocked has been set up to automatically review your team's pull requests to identify genuine bugs and issues.

2 issues found.

About Unblocked

📖 Documentation — Learn more in our docs.

💬 Ask questions — Mention @unblocked to request a review or summary, or ask follow-up questions about your code.

👍 Give feedback — React to comments with 👍 or 👎 to help us improve.

⚙️ Customize — Adjust settings in your preferences.

Comment thread src/social_graph/graph/reviewers.py
Comment thread src/social_graph/graph/reviewers.py
@pwerry
Copy link
Copy Markdown
Contributor Author

pwerry commented Apr 5, 2026

@unblocked I wonder if you would mind sharing your opinions about the entire codebase?

@unblocked
Copy link
Copy Markdown

unblocked Bot commented Apr 5, 2026

Sure thing, Peter! Here are my impressions after exploring the codebase:

Overall: A well-structured, focused project 👍

social-graph is a clean, well-scoped tool — build a social graph from GitHub PR history to identify domain experts and (now) recommend reviewers. The codebase is small enough to stay comprehensible but organized enough to grow.

What I like

  • Clear module boundaries. The separation into fetcher/ (GitHub API + cache), graph/ (expertise, social, reviewers), scoring.py (primitives), models.py (data), cli.py (presentation), and config.py (settings/secrets) is logical and easy to navigate.

  • Good data modeling. models.py is tight — frozen dataclasses, sensible defaults, a well-curated bot list, and clean User helpers like is_bot_or_service and display_name. The models are reused consistently throughout.

  • Scoring primitives are thoughtful. The log-dampened interaction weighting in scoring.py (diminishing returns for prolific reviewers, per-PR normalization) is a sensible approach and avoids naive count-based scoring.

  • Config/secrets handling is pragmatic. The layered secret resolution (env var → platform CLI → keyring) with cross-platform support is a nice touch for a CLI tool. Using gh auth token as a fallback is user-friendly.

  • Solid test coverage. 9 test files covering cache, decay, expertise, scoring, and the social graph, plus the new reviewer tests. The test helpers (like _make_edge, _mock_experts) keep things readable.

Things I'd consider improving

  • cli.py is getting large. At 400+ lines (and growing with each new command like suggest-reviewers), it's becoming a monolith. Consider splitting commands into separate modules under a cli/ package, with each command in its own file.

  • Inline imports in CLI commands. The suggest-reviewers command has several import statements inside the function body (subprocess, PRCache, suggest_reviewers, rich.Table). I understand lazy imports for startup speed in a CLI, but it would be cleaner to have a consistent pattern — maybe a single lazy-import utility or at least top-level imports for stdlib modules like subprocess.

  • expertise.py vs code_expertise.py in graph/. Having both files in the same directory is a bit confusing from the naming alone. It's not immediately obvious how they differ. Clearer naming or consolidation would help new contributors.

  • Magic numbers in reviewer weights. The 0.5 multiplier for reverse-direction social edges (line 79 of reviewers.py: edge.weight * 0.5) and the threshold constants for reason strings (0.5, 0.2, 0.3) are scattered inline. These could join WEIGHT_EXPERTISE/WEIGHT_PROXIMITY/WEIGHT_LOAD_BALANCE as named constants for easier tuning.

  • No type annotations on return dicts. suggest_reviewers returns list[dict] — a TypedDict or a small dataclass (like you already have for ExpertRanking and SocialEdge) would make the API self-documenting and catch key-name typos.

  • Limited error handling in git subprocess calls. The author auto-detection in the CLI silently swallows failures, which is fine, but experts_for_path (which presumably shells out to git log) could benefit from clearer error messaging if the repo path is invalid or git isn't available.

Summary

It's a well-crafted project with good foundations — clean models, sensible algorithms, and pragmatic tooling choices. The main areas for investment would be keeping the CLI modular as commands grow, and tightening up the type contracts on the newer dictionary-based APIs. Nice work! 🎉

In reply to #11 (comment)

- Extract login from email addresses returned by experts_for_path
  (strips domain and noreply prefix like '12345+user@...')
- Guard against ZeroDivisionError when all expertise scores are 0.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kasbuunk kasbuunk requested a review from baell as a code owner April 8, 2026 14:37
Comment thread src/social_graph/graph/reviewers.py
Build an email_to_login mapping from social edges so expertise keys
(email-prefix from git log) resolve to GitHub logins used by social
edges. Also catch subprocess.TimeoutExpired in CLI author detection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/social_graph/graph/reviewers.py
Raw edge weights (already <= 1.0, halved for reverse edges) were
systematically underweighting the proximity signal. Now normalized
like expertise and load so the 60/30/10 split works as documented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/social_graph/graph/reviewers.py
The identity-only mapping (login→login) didn't resolve email prefixes
from git log to GitHub logins. Now accepts a known_users dict (built
from PR data in the CLI) that maps display names and email prefixes
to GitHub logins. Also resolves the auto-detected author email prefix
through the same mapping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +86 to +88
avg_expertise: dict[str, float] = {}
for login, total in expertise_scores.items():
avg_expertise[login] = total / expertise_counts[login]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The average is computed only over the paths where a user appears (expertise_counts[login]), not over all changed_paths. This means a person who is a heavy committer on 1 of 10 changed files (but absent from the other 9) gets the same average expertise as if that single file were the only changed file — their score is never diluted by the files they don't know.

Example with changed_paths = ["a.py", "b.py", "c.py"]:

  • Alice: score 10 on a.py, absent from b.py and c.py → avg = 10/1 = 10
  • Bob: score 4 on each of all three files → avg = 12/3 = 4

Alice is ranked far above Bob despite Bob knowing all the changed code. The denominator should be len(changed_paths) to properly reflect breadth of expertise across the change set.

    for login, total in expertise_scores.items():
        avg_expertise[login] = total / len(changed_paths)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Add reviewer recommendation engine

2 participants