Evaluation: Show cost by AkhileshNegi · Pull Request #746 · ProjectTech4DevAI/kaapi-backend

AkhileshNegi · 2026-04-10T04:10:37Z

Summary

Target issue is #742

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

New Features
- Added cost tracking to evaluation runs, breaking down USD and token costs between response generation and embedding operations.
- Cost information now displays in evaluation run details with support for multiple chat and embedding models.
Database
- Updated evaluation run schema to store cost data.

coderabbitai · 2026-04-10T04:10:44Z

📝 Walkthrough

Walkthrough

This PR introduces cost tracking functionality to evaluation runs by adding a new nullable cost JSONB field to the evaluation run schema, implementing token-based USD pricing utilities for chat and embedding models, and integrating cost calculation into the evaluation processing pipeline for both response and embedding batch stages.

Changes

Cohort / File(s)	Summary
Database Schema `backend/app/alembic/versions/050_add_cost_to_evaluation_run.py`	New migration adds nullable `cost` JSONB column to `evaluation_run` table with column comment describing token and USD cost tracking; downgrade removes the column.
Data Model `backend/app/models/evaluation.py`	Added nullable `cost` field (dict) to both `EvaluationRun` and `EvaluationRunPublic` to expose cost breakdown data.
Pricing Utilities `backend/app/crud/evaluations/pricing.py`	New module implementing cost calculation with `MODEL_PRICING` rates, `calculate_token_cost()` for USD computation, `build_response_cost_entry()` and `build_embedding_cost_entry()` for batch aggregation, and `build_cost_dict()` for combining cost entries.
CRUD Integration `backend/app/crud/evaluations/__init__.py`, `backend/app/crud/evaluations/core.py`	Updated `__init__.py` to export four pricing functions; extended `update_evaluation_run()` with optional `cost` parameter to persist cost data.
Evaluation Processing `backend/app/crud/evaluations/processing.py`	Integrated cost tracking into two stages: (1) response cost computed and persisted immediately after response parsing in `process_completed_evaluation()`, and (2) embedding cost computed and merged into existing cost in `process_completed_embedding_batch()`; introduced `_safe_attach_cost()` helper to handle cost calculations safely.
Test Coverage `backend/app/tests/crud/evaluations/test_pricing.py`	New comprehensive test module validating `calculate_token_cost()`, `build_response_cost_entry()`, `build_embedding_cost_entry()`, and `build_cost_dict()` across known/unknown models, edge cases, and token aggregation scenarios.
Test Updates `backend/app/tests/crud/evaluations/test_processing.py`	Enhanced existing tests to populate realistic token usage data and verify end-to-end cost tracking: response cost persisted after evaluation, embedding cost merged without overwriting, and `total_cost_usd` correctly summed across both stages.

Sequence Diagram(s)

sequenceDiagram
    participant EvalProc as Evaluation Processor
    participant Cost as Pricing Engine
    participant DB as Database
    
    rect rgba(100, 150, 200, 0.5)
    Note over EvalProc,DB: Response Cost Tracking
    EvalProc->>EvalProc: Parse response results
    EvalProc->>Cost: _safe_attach_cost(response_results)
    Cost->>Cost: build_response_cost_entry() + aggregate tokens
    Cost->>Cost: calculate_token_cost() → USD
    Cost-->>EvalProc: Updated eval_run.cost
    EvalProc->>DB: update_evaluation_run(..., cost)
    end
    
    rect rgba(180, 120, 150, 0.5)
    Note over EvalProc,DB: Embedding Cost Tracking (Later)
    EvalProc->>EvalProc: Parse embedding batch results
    EvalProc->>Cost: _safe_attach_cost(embedding_results)
    Cost->>Cost: build_embedding_cost_entry() + aggregate tokens
    Cost->>Cost: calculate_token_cost() → USD
    Cost->>Cost: build_cost_dict(response_entry, embedding_entry)
    Cost-->>EvalProc: Merged eval_run.cost
    EvalProc->>DB: update_evaluation_run(..., cost)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

ProjectTech4DevAI/kaapi-backend#558: Both PRs extend update_evaluation_run() in the same core module and add new nullable fields to the EvaluationRun model following a similar pattern.

Suggested labels

enhancement, ready-for-review

Suggested reviewers

nishika26
Prajna1999

Poem

🐰 Costs now tracked from end to end,
With tokens counted, pennies penned,
Response, embedding, costs align,
In JSONB rows so fine!
Our evals know just what they spend! 💰

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Evaluation: Show cost' clearly and concisely summarizes the main objective of the pull request—adding cost display/tracking functionality to evaluations across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 94.12% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch enhancement/evaluation-cost

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

backend/app/crud/evaluations/processing.py (1)

516-528: Persist cost explicitly in the final update call.

Line 516 updates eval_run.cost directly, but Line 526 uses update_evaluation_run(...) without cost. Passing it explicitly keeps persistence behavior consistent and future-proof.

♻️ Suggested tweak

-        eval_run = update_evaluation_run(
-            session=session, eval_run=eval_run, status="completed", score=eval_run.score
-        )
+        eval_run = update_evaluation_run(
+            session=session,
+            eval_run=eval_run,
+            status="completed",
+            score=eval_run.score,
+            cost=eval_run.cost,
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/processing.py` around lines 516 - 528, The
eval_run.cost is set via build_cost_dict (response_entry, embedding_cost_entry)
but the subsequent persistence uses update_evaluation_run(session=session,
eval_run=eval_run, status="completed", score=eval_run.score) which omits cost;
modify the final call to include cost (e.g., cost=eval_run.cost) so the computed
cost is persisted, keeping the try/except around build_cost_dict as-is and
ensuring update_evaluation_run receives the cost even when previously assigned
to eval_run.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/alembic/versions/050_add_cost_to_evaluation_run.py`:
- Around line 20-33: The migration functions upgrade and downgrade are missing
return type annotations; update the function signatures for upgrade() and
downgrade() to include explicit return type hints (-> None) to satisfy the
repository's typing guidelines, keeping the existing bodies unchanged and only
modifying the function declarations.

In `@backend/app/crud/evaluations/pricing.py`:
- Around line 92-98: calculate_response_cost is returning 0.0 for several
production-supported models because MODEL_PRICING lacks entries for variants
(e.g., gpt-4, gpt-4-turbo, o1-preview, o1-mini, gpt-4.1-mini, gpt-4.1-nano,
gemini-2.5-pro, gemini-2.5-pro-preview-tts) referenced in SUPPORTED_MODELS;
update the implementation by either (A) adding the missing pricing entries into
MODEL_PRICING for those exact model keys, or (B) implement normalization/mapping
inside calculate_response_cost (or a helper) to map variant names to existing
pricing keys (e.g., normalize "gpt-4-turbo" -> "gpt-4-turbo" pricing key or map
preview/mini/nano variants to their base model key) so the lookup
MODEL_PRICING.get(model) returns a valid price instead of falling back to 0.0;
ensure the same fix is applied to the other lookup block around lines 115-121
that uses MODEL_PRICING.

---

Nitpick comments:
In `@backend/app/crud/evaluations/processing.py`:
- Around line 516-528: The eval_run.cost is set via build_cost_dict
(response_entry, embedding_cost_entry) but the subsequent persistence uses
update_evaluation_run(session=session, eval_run=eval_run, status="completed",
score=eval_run.score) which omits cost; modify the final call to include cost
(e.g., cost=eval_run.cost) so the computed cost is persisted, keeping the
try/except around build_cost_dict as-is and ensuring update_evaluation_run
receives the cost even when previously assigned to eval_run.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 623d9238-23b0-443e-906a-bd51d8e56578

📥 Commits

Reviewing files that changed from the base of the PR and between dbcdb20 and b6750f0.

📒 Files selected for processing (7)

backend/app/alembic/versions/050_add_cost_to_evaluation_run.py
backend/app/crud/evaluations/__init__.py
backend/app/crud/evaluations/core.py
backend/app/crud/evaluations/embeddings.py
backend/app/crud/evaluations/pricing.py
backend/app/crud/evaluations/processing.py
backend/app/models/evaluation.py

coderabbitai · 2026-04-10T04:16:51Z

backend/app/alembic/versions/050_add_cost_to_evaluation_run.py

+def upgrade():
+    op.add_column(
+        "evaluation_run",
+        sa.Column(
+            "cost",
+            postgresql.JSONB(astext_type=sa.Text()),
+            nullable=True,
+            comment="Cost tracking (response/embedding tokens and USD)",
+        ),
+    )
+
+
+def downgrade():
+    op.drop_column("evaluation_run", "cost")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify migration functions lacking return annotations rg -nP '^def\s+(upgrade|downgrade)\s*\(\s*\)\s*:' backend/app/alembic/versions/050_add_cost_to_evaluation_run.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 112

Add explicit return type hints to migration functions.

Lines 20 and 32 define upgrade() and downgrade() without return annotations, which violates the repository's Python typing guideline.

✅ Minimal fix

-def upgrade(): +def upgrade() -> None: op.add_column( "evaluation_run", sa.Column( "cost", postgresql.JSONB(astext_type=sa.Text()), nullable=True, comment="Cost tracking (response/embedding tokens and USD)", ), ) -def downgrade(): +def downgrade() -> None: op.drop_column("evaluation_run", "cost")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def upgrade():

op.add_column(

"evaluation_run",

sa.Column(

"cost",

postgresql.JSONB(astext_type=sa.Text()),

nullable=True,

comment="Cost tracking (response/embedding tokens and USD)",

),

)

def downgrade():

op.drop_column("evaluation_run", "cost")

def upgrade() -> None:

op.add_column(

"evaluation_run",

sa.Column(

"cost",

postgresql.JSONB(astext_type=sa.Text()),

nullable=True,

comment="Cost tracking (response/embedding tokens and USD)",

),

)

def downgrade() -> None:

op.drop_column("evaluation_run", "cost")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/050_add_cost_to_evaluation_run.py` around lines 20 - 33, The migration functions upgrade and downgrade are missing return type annotations; update the function signatures for upgrade() and downgrade() to include explicit return type hints (-> None) to satisfy the repository's typing guidelines, keeping the existing bodies unchanged and only modifying the function declarations.

coderabbitai · 2026-04-10T04:16:51Z

backend/app/crud/evaluations/pricing.py

+    pricing = MODEL_PRICING.get(model)
+    if not pricing:
+        logger.warning(
+            f"[calculate_response_cost] Unknown model '{model}', returning cost 0.0"
+        )
+        return 0.0
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Detect model literals in repo that are not directly present in MODEL_PRICING keys. python - <<'PY' import ast, pathlib, re, json pricing_path = pathlib.Path("backend/app/crud/evaluations/pricing.py") tree = ast.parse(pricing_path.read_text()) pricing_keys = set() for node in tree.body: if isinstance(node, ast.AnnAssign) and getattr(node.target, "id", None) == "MODEL_PRICING": pricing_obj = ast.literal_eval(node.value) pricing_keys = set(pricing_obj.keys()) break model_regex = re.compile(r'["\\\']model["\\\']\s*:\s*["\\\']([^"\\\']+)["\\\']') found = {} for p in pathlib.Path(".").rglob("*.py"): try: text = p.read_text() except Exception: continue for m in model_regex.finditer(text): found.setdefault(m.group(1), set()).add(str(p)) unknown = {m: sorted(paths) for m, paths in found.items() if m not in pricing_keys} print("MODEL_PRICING_KEYS=", sorted(pricing_keys)) print("UNKNOWN_MODEL_LITERALS=", json.dumps(unknown, indent=2)) PY

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2493

🏁 Script executed:

head -150 backend/app/crud/evaluations/pricing.py | tail -80

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2439

🏁 Script executed:

# Check where gpt-4, gpt-4-turbo, o1, gemini-2.5-pro are used in non-test code rg -t py "gpt-4|gpt-4-turbo|o1|gemini-2\.5-pro" --glob '!**/tests/**' backend/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4282

🏁 Script executed:

# Check if calculate_response_cost or calculate_embedding_cost are called with actual model values rg "calculate_response_cost|calculate_embedding_cost" --glob '!**/tests/**' -A 2 -B 2 backend/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3012

🏁 Script executed:

# Check who calls build_response_cost_entry and build_embedding_cost_entry in production code rg "build_response_cost_entry|build_embedding_cost_entry" --glob '!**/tests/**' -B 3 -A 3 backend/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4317

🏁 Script executed:

# Check what values are passed to build_response_cost_entry and EMBEDDING_MODEL in processing.py rg "model\s*=" backend/app/crud/evaluations/processing.py | head -20 rg "EMBEDDING_MODEL" backend/app/crud/evaluations/processing.py -B 5 -A 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 900

🏁 Script executed:

# Check what EMBEDDING_MODEL is set to rg "EMBEDDING_MODEL\s*=" backend/app/crud/evaluations/embeddings.py -A 1 # Check what resolve_model_from_config returns rg "def resolve_model_from_config" backend/app/crud/evaluations/core.py -A 20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 745

🏁 Script executed:

# Check if gpt-4, gpt-4-turbo, o1 are actually in the SUPPORTED models list rg "SUPPORTED.*MODELS|DEFAULT.*MODEL" backend/app/models/llm/constants.py -B 2 -A 8 | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 579

🏁 Script executed:

# Continue reading the SUPPORTED_MODELS to see all OpenAI models rg "SUPPORTED_MODELS" backend/app/models/llm/constants.py -A 30 | grep -E "gpt-4|o1"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 277

Add missing pricing data for production-supported models to avoid false-zero cost calculations.

Lines 92 and 115 perform exact-key lookup against MODEL_PRICING, but several models in SUPPORTED_MODELS are missing pricing data:

gpt-4, gpt-4-turbo, o1-preview, o1-mini, gpt-4.1-mini, gpt-4.1-nano

gemini-2.5-pro, gemini-2.5-pro-preview-tts

When an evaluation run uses any of these models, calculate_response_cost() returns 0.0, causing cost tracking to underreport.

🐛 Suggested fix

Either add missing model pricing to MODEL_PRICING, or implement model normalization to map variants to available pricing data:

+def _normalize_pricing_model(model: str) -> str: + if model in MODEL_PRICING: + return model + # Match version-suffixed IDs to known base models, preferring longest base first + for base in sorted(MODEL_PRICING.keys(), key=len, reverse=True): + if model.startswith(f"{base}-"): + return base + return model + def calculate_response_cost(model: str, input_tokens: int, output_tokens: int) -> float: @@ - pricing = MODEL_PRICING.get(model) + normalized_model = _normalize_pricing_model(model) + pricing = MODEL_PRICING.get(normalized_model) if not pricing: logger.warning( - f"[calculate_response_cost] Unknown model '{model}', returning cost 0.0" + f"[calculate_response_cost] Unknown model '{model}', returning cost 0.0" ) return 0.0 @@ def calculate_embedding_cost(model: str, prompt_tokens: int) -> float: @@ - pricing = MODEL_PRICING.get(model) + normalized_model = _normalize_pricing_model(model) + pricing = MODEL_PRICING.get(normalized_model) if not pricing: logger.warning( f"[calculate_embedding_cost] Unknown model '{model}', returning cost 0.0" ) return 0.0

Also applies to: 115-121

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/pricing.py` around lines 92 - 98, calculate_response_cost is returning 0.0 for several production-supported models because MODEL_PRICING lacks entries for variants (e.g., gpt-4, gpt-4-turbo, o1-preview, o1-mini, gpt-4.1-mini, gpt-4.1-nano, gemini-2.5-pro, gemini-2.5-pro-preview-tts) referenced in SUPPORTED_MODELS; update the implementation by either (A) adding the missing pricing entries into MODEL_PRICING for those exact model keys, or (B) implement normalization/mapping inside calculate_response_cost (or a helper) to map variant names to existing pricing keys (e.g., normalize "gpt-4-turbo" -> "gpt-4-turbo" pricing key or map preview/mini/nano variants to their base model key) so the lookup MODEL_PRICING.get(model) returns a valid price instead of falling back to 0.0; ensure the same fix is applied to the other lookup block around lines 115-121 that uses MODEL_PRICING.

coderabbitai

♻️ Duplicate comments (1)

backend/app/crud/evaluations/pricing.py (1)

23-72: ⚠️ Potential issue | 🟠 Major

MODEL_PRICING still misses supported evaluation models.

calculate_token_cost() does an exact lookup on Lines 92-97. Any run whose configured model is supported elsewhere in the app but absent here will silently get 0.0, so the new EvaluationRun.cost field underreports valid usage. Please either add the missing keys or normalize model IDs before the lookup.

Run this read-only check to compare model literals used by the app with the pricing table; after the fix, the missing-pricing list should only contain models you intentionally do not cost:

#!/bin/bash
python - <<'PY'
import ast
import pathlib
import re

pricing_path = pathlib.Path("backend/app/crud/evaluations/pricing.py")
embeddings_path = pathlib.Path("backend/app/crud/evaluations/embeddings.py")
constants_path = pathlib.Path("backend/app/models/llm/constants.py")

embedding_text = embeddings_path.read_text()
match = re.search(r'EMBEDDING_MODEL\s*=\s*"([^"]+)"', embedding_text)
embedding_model = match.group(1) if match else None

tree = ast.parse(pricing_path.read_text())
pricing_keys = set()

for node in tree.body:
    if isinstance(node, ast.AnnAssign) and getattr(node.target, "id", None) == "MODEL_PRICING":
        for key_node in node.value.keys:
            if isinstance(key_node, ast.Constant) and isinstance(key_node.value, str):
                pricing_keys.add(key_node.value)
            elif (
                isinstance(key_node, ast.Name)
                and key_node.id == "EMBEDDING_MODEL"
                and embedding_model
            ):
                pricing_keys.add(embedding_model)
        break

constants_text = constants_path.read_text()
supported_models = sorted(
    set(re.findall(r'"((?:gpt|o1|gemini|text-embedding)[^"]*)"', constants_text))
)

missing = [model for model in supported_models if model not in pricing_keys]

print("MODEL_PRICING keys:")
for model in sorted(pricing_keys):
    print(f"  - {model}")

print("\nSupported-looking model literals missing pricing:")
for model in missing:
    print(f"  - {model}")
PY

Also applies to: 92-97

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/pricing.py` around lines 23 - 72, MODEL_PRICING
is missing entries used elsewhere so calculate_token_cost (the exact lookup in
calculate_token_cost) can return 0.0; either add the missing model keys to
MODEL_PRICING or normalize model IDs before lookup. Fix options: (1) add the
supported model literals (from models/constants.py) as keys in MODEL_PRICING
with appropriate input/output costs (update the dict entries like "gpt-4o",
"gpt-5", etc.), or (2) change calculate_token_cost to map aliases/variants to
the canonical pricing key (e.g., normalize model names or strip suffixes) before
doing the MODEL_PRICING lookup so EvaluationRun.cost is computed correctly.
Ensure the chosen approach covers EMBEDDING_MODEL and any "mini"/"nano"/"-pro"
variants referenced by the app.

🧹 Nitpick comments (1)

backend/app/tests/crud/evaluations/test_pricing.py (1)
18-28: Avoid deriving expected costs from MODEL_PRICING.

These assertions use the same pricing table as the implementation, so an incorrect rate change can update both sides and still leave the tests green. Prefer hard-coded expected USD values, or a small local test matrix that pins the intended rates independently of the production constant.

Also applies to: 32-37, 55-62, 93-99, 162-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/crud/evaluations/test_pricing.py` around lines 18 - 28, The
tests currently compute expected values using MODEL_PRICING, which mirrors
production constants and can mask pricing regressions; update each assertion
that derives expected cost from MODEL_PRICING to use hard-coded expected USD
values instead (compute the expected by hand for the given
input_tokens/output_tokens and paste the numeric expected value into the test),
specifically replace uses of MODEL_PRICING in the calculate_token_cost test
cases with fixed numeric literals so the tests verify against pinned rates
independent of MODEL_PRICING or production changes; keep calls to
calculate_token_cost(model="gpt-4o", ...) and only change the expected
variable/assertion to a hard-coded float (use pytest.approx when appropriate).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/app/crud/evaluations/pricing.py`:
- Around line 23-72: MODEL_PRICING is missing entries used elsewhere so
calculate_token_cost (the exact lookup in calculate_token_cost) can return 0.0;
either add the missing model keys to MODEL_PRICING or normalize model IDs before
lookup. Fix options: (1) add the supported model literals (from
models/constants.py) as keys in MODEL_PRICING with appropriate input/output
costs (update the dict entries like "gpt-4o", "gpt-5", etc.), or (2) change
calculate_token_cost to map aliases/variants to the canonical pricing key (e.g.,
normalize model names or strip suffixes) before doing the MODEL_PRICING lookup
so EvaluationRun.cost is computed correctly. Ensure the chosen approach covers
EMBEDDING_MODEL and any "mini"/"nano"/"-pro" variants referenced by the app.

---

Nitpick comments:
In `@backend/app/tests/crud/evaluations/test_pricing.py`:
- Around line 18-28: The tests currently compute expected values using
MODEL_PRICING, which mirrors production constants and can mask pricing
regressions; update each assertion that derives expected cost from MODEL_PRICING
to use hard-coded expected USD values instead (compute the expected by hand for
the given input_tokens/output_tokens and paste the numeric expected value into
the test), specifically replace uses of MODEL_PRICING in the
calculate_token_cost test cases with fixed numeric literals so the tests verify
against pinned rates independent of MODEL_PRICING or production changes; keep
calls to calculate_token_cost(model="gpt-4o", ...) and only change the expected
variable/assertion to a hard-coded float (use pytest.approx when appropriate).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3fba4ac7-711a-4d3b-916d-56d3c08ae24e

📥 Commits

Reviewing files that changed from the base of the PR and between b6750f0 and 63eb942.

📒 Files selected for processing (6)

backend/app/crud/evaluations/__init__.py
backend/app/crud/evaluations/core.py
backend/app/crud/evaluations/pricing.py
backend/app/crud/evaluations/processing.py
backend/app/tests/crud/evaluations/test_pricing.py
backend/app/tests/crud/evaluations/test_processing.py

✅ Files skipped from review due to trivial changes (1)

backend/app/crud/evaluations/init.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/crud/evaluations/core.py

AkhileshNegi added 2 commits April 9, 2026 22:36

first stab at costing

631f3f4

minor fixes

b6750f0

AkhileshNegi marked this pull request as ready for review April 10, 2026 04:11

AkhileshNegi self-assigned this Apr 10, 2026

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

cleanup

63eb942

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Show cost#746

Evaluation: Show cost#746
AkhileshNegi wants to merge 3 commits intomainfrom
enhancement/evaluation-cost

AkhileshNegi commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AkhileshNegi commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AkhileshNegi commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading