Add reviewer recommendation engine by pwerry · Pull Request #11 · unblocked/engineering-social-graph

pwerry · 2026-04-05T03:40:38Z

Summary

Closes #5. Adds a suggest-reviewers command that recommends the best PR reviewers for a given set of changed files.

The algorithm combines three signals:

Code expertise (60%) — who knows this code best, scored from git log history
Social proximity (30%) — how closely do they collaborate with the PR author (from the social graph)
Review load balance (10%) — prefer reviewers who aren't already overloaded

Usage

# Suggest reviewers for specific files
social-graph suggest-reviewers src/social_graph/scoring.py

# Multiple paths
social-graph suggest-reviewers src/social_graph/graph/ src/social_graph/models.py

# Explicit author (auto-detected from git config by default)
social-graph suggest-reviewers --author pwerry src/social_graph/cli.py

Example output

┌───┬──────────┬───────┬───────────┬───────────┬──────────────────────────────┐
│ # │ Reviewer │ Score │ Expertise │ Proximity │ Why                          │
├───┼──────────┼───────┼───────────┼───────────┼──────────────────────────────┤
│ 1 │ alice    │ 0.832 │      0.95 │      0.80 │ domain expert, collaborator  │
│ 2 │ bob      │ 0.614 │      0.72 │      0.45 │ knows the code               │
│ 3 │ charlie  │ 0.401 │      0.30 │      0.60 │ close collaborator           │
└───┴──────────┴───────┴───────────┴───────────┴──────────────────────────────┘

Changes

New: src/social_graph/graph/reviewers.py — recommendation algorithm
Modified: src/social_graph/cli.py — suggest-reviewers CLI command
New: tests/test_reviewers.py — 8 tests covering ranking, author exclusion, expertise/proximity weighting, edge cases

Test plan

57 tests passing (8 new + 49 existing)
ruff check clean
Manual test: social-graph suggest-reviewers --repo-path ~/Developer/unblocked src/

New `suggest-reviewers` command that ranks the best reviewers for changed files using a weighted combination of: - Code expertise (60%) — who knows this code best, from git history - Social proximity (30%) — who already collaborates with the PR author - Review load balance (10%) — prefer less-loaded reviewers Usage: social-graph suggest-reviewers src/social_graph/scoring.py social-graph suggest-reviewers --author pwerry src/graph/ src/models.py Auto-detects the PR author from git config. Requires a prior `build`.

unblocked

This PR has been reviewed by Unblocked Code Review

Unblocked has been set up to automatically review your team's pull requests to identify genuine bugs and issues.

2 issues found.

About Unblocked

📖 Documentation — Learn more in our docs.

💬 Ask questions — Mention @unblocked to request a review or summary, or ask follow-up questions about your code.

👍 Give feedback — React to comments with 👍 or 👎 to help us improve.

⚙️ Customize — Adjust settings in your preferences.

pwerry · 2026-04-05T03:44:55Z

@unblocked I wonder if you would mind sharing your opinions about the entire codebase?

unblocked · 2026-04-05T03:45:45Z

Sure thing, Peter! Here are my impressions after exploring the codebase:

Overall: A well-structured, focused project 👍

social-graph is a clean, well-scoped tool — build a social graph from GitHub PR history to identify domain experts and (now) recommend reviewers. The codebase is small enough to stay comprehensible but organized enough to grow.

What I like

Clear module boundaries. The separation into fetcher/ (GitHub API + cache), graph/ (expertise, social, reviewers), scoring.py (primitives), models.py (data), cli.py (presentation), and config.py (settings/secrets) is logical and easy to navigate.
Good data modeling. models.py is tight — frozen dataclasses, sensible defaults, a well-curated bot list, and clean User helpers like is_bot_or_service and display_name. The models are reused consistently throughout.
Scoring primitives are thoughtful. The log-dampened interaction weighting in scoring.py (diminishing returns for prolific reviewers, per-PR normalization) is a sensible approach and avoids naive count-based scoring.
Config/secrets handling is pragmatic. The layered secret resolution (env var → platform CLI → keyring) with cross-platform support is a nice touch for a CLI tool. Using gh auth token as a fallback is user-friendly.
Solid test coverage. 9 test files covering cache, decay, expertise, scoring, and the social graph, plus the new reviewer tests. The test helpers (like _make_edge, _mock_experts) keep things readable.

Things I'd consider improving

cli.py is getting large. At 400+ lines (and growing with each new command like suggest-reviewers), it's becoming a monolith. Consider splitting commands into separate modules under a cli/ package, with each command in its own file.
Inline imports in CLI commands. The suggest-reviewers command has several import statements inside the function body (subprocess, PRCache, suggest_reviewers, rich.Table). I understand lazy imports for startup speed in a CLI, but it would be cleaner to have a consistent pattern — maybe a single lazy-import utility or at least top-level imports for stdlib modules like subprocess.
expertise.py vs code_expertise.py in graph/. Having both files in the same directory is a bit confusing from the naming alone. It's not immediately obvious how they differ. Clearer naming or consolidation would help new contributors.
Magic numbers in reviewer weights. The 0.5 multiplier for reverse-direction social edges (line 79 of reviewers.py: edge.weight * 0.5) and the threshold constants for reason strings (0.5, 0.2, 0.3) are scattered inline. These could join WEIGHT_EXPERTISE/WEIGHT_PROXIMITY/WEIGHT_LOAD_BALANCE as named constants for easier tuning.
No type annotations on return dicts. suggest_reviewers returns list[dict] — a TypedDict or a small dataclass (like you already have for ExpertRanking and SocialEdge) would make the API self-documenting and catch key-name typos.
Limited error handling in git subprocess calls. The author auto-detection in the CLI silently swallows failures, which is fine, but experts_for_path (which presumably shells out to git log) could benefit from clearer error messaging if the repo path is invalid or git isn't available.

Summary

It's a well-crafted project with good foundations — clean models, sensible algorithms, and pragmatic tooling choices. The main areas for investment would be keeping the CLI modular as commands grow, and tightening up the type contracts on the newer dictionary-based APIs. Nice work! 🎉

_{In reply to #11 (comment)}

- Extract login from email addresses returned by experts_for_path (strips domain and noreply prefix like '12345+user@...') - Guard against ZeroDivisionError when all expertise scores are 0.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Build an email_to_login mapping from social edges so expertise keys (email-prefix from git log) resolve to GitHub logins used by social edges. Also catch subprocess.TimeoutExpired in CLI author detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Raw edge weights (already <= 1.0, halved for reverse edges) were systematically underweighting the proximity signal. Now normalized like expertise and load so the 60/30/10 split works as documented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The identity-only mapping (login→login) didn't resolve email prefixes from git log to GitHub logins. Now accepts a known_users dict (built from PR data in the CLI) that maps display names and email prefixes to GitHub logins. Also resolves the auto-detected author email prefix through the same mapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

unblocked · 2026-04-08T15:00:26Z

+    avg_expertise: dict[str, float] = {}
+    for login, total in expertise_scores.items():
+        avg_expertise[login] = total / expertise_counts[login]


The average is computed only over the paths where a user appears (expertise_counts[login]), not over all changed_paths. This means a person who is a heavy committer on 1 of 10 changed files (but absent from the other 9) gets the same average expertise as if that single file were the only changed file — their score is never diluted by the files they don't know.

Example with changed_paths = ["a.py", "b.py", "c.py"]:

Alice: score 10 on a.py, absent from b.py and c.py → avg = 10/1 = 10

Bob: score 4 on each of all three files → avg = 12/3 = 4

Alice is ranked far above Bob despite Bob knowing all the changed code. The denominator should be len(changed_paths) to properly reflect breadth of expertise across the change set.

for login, total in expertise_scores.items(): avg_expertise[login] = total / len(changed_paths)

unblocked Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread src/social_graph/graph/reviewers.py

Comment thread src/social_graph/graph/reviewers.py

kasbuunk requested a review from baell as a code owner April 8, 2026 14:37

unblocked Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/social_graph/graph/reviewers.py

unblocked Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/social_graph/graph/reviewers.py

unblocked Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/social_graph/graph/reviewers.py

unblocked Bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reviewer recommendation engine#11

Add reviewer recommendation engine#11
pwerry wants to merge 5 commits intomainfrom
feature/suggest-reviewers

pwerry commented Apr 5, 2026 •

edited

Loading

Uh oh!

unblocked Bot left a comment

Uh oh!

Uh oh!

Uh oh!

pwerry commented Apr 5, 2026

Uh oh!

unblocked Bot commented Apr 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

unblocked Bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

pwerry commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Example output

Changes

Test plan

Uh oh!

unblocked Bot left a comment

Choose a reason for hiding this comment

This PR has been reviewed by Unblocked Code Review

About Unblocked

Uh oh!

Uh oh!

Uh oh!

pwerry commented Apr 5, 2026

Uh oh!

unblocked Bot commented Apr 5, 2026

Overall: A well-structured, focused project 👍

What I like

Things I'd consider improving

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

unblocked Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

pwerry commented Apr 5, 2026 •

edited

Loading