feat: support Gemini compare usage capture by mohanagy · Pull Request #19 · mohanagy/graphify-ts

mohanagy · 2026-04-27T18:40:58Z

Summary

port the compare usage baseline needed for Gemini work and preserve safe answer-artifact fallback behavior
capture Gemini provider-reported usage from structured JSON, including multipart answer assembly and strict fallback behavior
document the correct Gemini compare invocation and clarify when compare reports real usage vs labeled estimates

Test Plan

npm run typecheck
npm run test:run
npm run build

Summary by CodeRabbit

New Features
- Enhanced retrieval ranking with relation-aware expansion for improved context matching accuracy
- Improved token usage tracking in compare mode with real provider-reported usage from Gemini and Claude, falling back to local estimates when unavailable
Documentation
- Added Gemini compare command examples and clarified token reporting semantics
- Updated guidance on how provider usage is captured and reported in results
Tests
- Added comprehensive test coverage for token handling and retrieval ranking behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-04-27T18:41:11Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The PR enhances the retrieval ranking system with relation-aware expansion and evidence-based scoring, implements structured parsing of provider-reported token usage (Gemini and Claude) in the compare workflow, and extends retrieval benchmarks with new gold-standard questions and stricter evaluation metrics. Documentation is updated to reflect these changes and clarify token reporting semantics.

Changes

Cohort / File(s)	Summary
Documentation & Changelog `CHANGELOG.md`, `README.md`, `docs/proof-workflows.md`, `examples/why-graphify.md`	Added changelog entries for retrieval ranking improvements and evaluation guardrails. Updated README and proof documentation to describe Gemini/Claude compare runner patterns, token usage capture behavior (provider-reported vs. local estimates), and `report.json` inclusion of `usageMetadata` or fallback `cl100k_base` estimates.
Benchmark Quality `src/infrastructure/benchmark/quality.ts`, `tests/unit/benchmark-quality.test.ts`	Extended `GOLD_QUESTIONS` constant with 3 new retrieval benchmark entries targeting `retrieveContext` and `scoreNode` labels. Added unit tests validating MRR scoring and recall metrics with tight result limits.
Compare Execution & Parsing `src/infrastructure/compare.ts`, `tests/unit/compare.test.ts`	Implemented structured stdout JSON parsing to extract answer text and provider-reported token usage (Claude `usage` and Gemini `usageMetadata`). Added `ComparePromptTokenSource` type and extended `ComparePromptUsage`/`ComparePromptReport` interfaces with usage metadata, token reduction ratios, and source labeling. Extensive unit tests validate Claude/Gemini parsing, fallback behavior, and summary reporting.
Retrieval Ranking Algorithm `src/runtime/retrieve.ts`, `tests/unit/retrieve.test.ts`	Refactored seed scoring and expansion from boost model to explicit evidence breakdown with exact-label matching, TF-IDF token overlap, source-path similarity, and community-label similarity. Implemented relation-aware multi-hop expansion that propagates weighted scores across incident neighbors and upgrades relevance bands. Updated node selection comparators and added comprehensive graph-building and traversal tests.

Sequence Diagram(s)

sequenceDiagram
    actor Runner as Compare Runner
    participant Parser as Compare Parser
    participant Report as Report Writer
    participant Summary as Summary Formatter

    Runner->>Parser: emit stdout (JSON or text)
    alt Structured JSON (Claude/Gemini)
        Parser->>Parser: extract answer text & usage metadata
        Note over Parser: Claude: usage field<br/>Gemini: usageMetadata
        Parser->>Report: write answer text to artifact
        Parser->>Report: record usage (input/total tokens,<br/>source label)
    else Plain text or malformed JSON
        Parser->>Parser: treat as plain answer text
        Parser->>Report: write text to artifact
        Parser->>Report: mark usage as null<br/>(fallback to cl100k_base)
    end
    Report->>Summary: sync prompt tokens & reduction ratios<br/>from captured usage
    Summary->>Summary: format output with token deltas<br/>and source labels

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: add reproducible demo proof kit #10: Both PRs modify src/infrastructure/benchmark/quality.ts to extend GOLD_QUESTIONS and refine retrieval quality evaluation.
Feature/workspace parity low cohesion baseline #7: Both PRs touch the benchmark subsystem; the retrieved PR also refactors benchmark evaluation and token logic, which relates to this PR's new benchmark entries and compare usage handling.

Poem

🐰 Hops of joy through ranking's light,
Relations guide each node just right,
Tokens parsed from Gemini's stream,
Evidence tiers fulfill the dream,
Benchmarks tightened, proofs align—
The proof of work, now crystal-line! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: support Gemini compare usage capture' directly and specifically summarizes the main objective of the pull request, which is adding Gemini support to the compare runtime's usage capture functionality.
Description check	✅ Passed	The pull request description covers the key changes (usage capture, Gemini support, documentation) and includes a test plan with all three required checks (typecheck, test:run, build) marked as completed, though it does not fully follow the template structure.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/gemini-compare-usage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mohanagy and others added 18 commits April 27, 2026 13:13

test: lock retrieval seed ranking behavior

b4bfffe

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: make retrieval expansion relation-aware

bdcc751

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: assert real second-hop retrieval nodes

286db39

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: add retrieval quality guardrails

2672fb1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: improve retrieval ranking quality

0ba24d0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix directed retrieval expansion

8b7492b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chore: port compare usage baseline

74f6aed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: avoid JSON answer artifact fallback

b0312e1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: cover Gemini compare usage capture

a3f703b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: strengthen Gemini compare regressions

78f1c01

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: capture Gemini compare usage

112bfe7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: concatenate Gemini compare answer parts

60cac76

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: lock Gemini compare fallback behavior

75630ee

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: tighten Gemini fallback assertions

90f0c7a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: cover Gemini usage-only artifacts

f5afba8

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: support Gemini compare usage capture

686e571

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: fix Gemini compare invocation

29599cd

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: fix Gemini changelog example

b281904

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mohanagy merged commit aaea0ee into main Apr 27, 2026
11 of 12 checks passed

mohanagy deleted the feature/gemini-compare-usage branch April 27, 2026 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support Gemini compare usage capture#19

feat: support Gemini compare usage capture#19
mohanagy merged 18 commits intomainfrom
feature/gemini-compare-usage

mohanagy commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading