Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 10 additions & 13 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,30 +119,27 @@ assert find_test_files("foo.ts", all_files, None) == ["foo.test.ts"]

**CRITICAL: NEVER start without explicit user request. PR must be clean — don't ignore failures.**

1. `git status` to see ALL changes
2. `scripts/git/list_changed_files.sh` — store for staging reference
3. Verify not on main: `git branch --show-current`
4. `git fetch origin main && git merge origin/main`
5. `git commit -m "descriptive message"` — user has already run `git add` before saying "lgtm"
1. `git fetch origin main && git merge origin/main`
2. `git commit -m "descriptive message"` — user has already run `git add` before saying "lgtm"
- Pre-commit hook runs automatically (see `scripts/git/pre_commit_hook.sh`): pip-freeze, generate-types, black, ruff, print/logging checks, then pylint + pyright + pytest concurrently
- Install: `ln -sf ../../scripts/git/pre_commit_hook.sh .git/hooks/pre-commit`
- **If hooks fail**: fix, re-stage, commit again. Don't stage other sessions' files.
- **`--no-verify`** only for trivial non-code changes
- Unused mock params: `# pyright: reportUnusedVariable=false` at top
- NO co-author lines or `[skip ci]`
6. Check for existing PR: `gh pr list --head $(git branch --show-current) --state open` — if exists, **STOP and ask**
7. `git push`
8. `gh pr create --title "PR title" --body "" --assignee @me` — create PR immediately, no body
9. Check recent posts: `scripts/git/recent_social_posts.sh gitauto` and `scripts/git/recent_social_posts.sh wes`
10. `gh pr edit <number> --body "..."` — add summary and social posts after checking recent posts
3. Check for existing PR: `gh pr list --head $(git branch --show-current) --state open` — if exists, **STOP and ask**
4. `git push`
5. `gh pr create --title "PR title" --body "" --assignee @me` — create PR immediately, no body
6. Check recent posts: `scripts/git/recent_social_posts.sh gitauto` and `scripts/git/recent_social_posts.sh wes`
7. `gh pr edit <number> --body "..."` — add summary and social posts after checking recent posts
- Technical, descriptive title. **No `## Test plan`**.
- **Two posts** (last section, customer-facing only): GitAuto (changelog) + Wes (personal voice, don't emphasize "GitAuto")
- Format: `## Social Media Post (GitAuto)` and `## Social Media Post (Wes)` headers (parsed by `extract-social-posts.js`)
- **GitAuto post**: Changelog format — one-liner headline + change bullets. No storytelling.
- **Wes post**: Honest stories. Vary openers — check recent posts first.
- Guidelines: No em dashes (—). Under 280 chars. No marketing keywords. No negative framing. No internal names. No small numbers — use relative language.
11. If Sentry issue: `python3 scripts/sentry/get_issue.py AGENT-XXX` then `python3 scripts/sentry/resolve_issue.py AGENT-XXX ...`
12. **Blog post** in `../website/app/blog/posts/`:
8. If Sentry issue: `python3 scripts/sentry/get_issue.py AGENT-XXX` then `python3 scripts/sentry/resolve_issue.py AGENT-XXX ...`
9. **Blog post** in `../website/app/blog/posts/`:
- `YYYY-MM-DD-kebab-case-title.mdx`. Universal dev lesson, not GitAuto internals (exception: deep technical content).
- **Skip if lesson is thin** — argue back if no real insight.
- `metadata.title`: **34-44 chars** (layout appends `- GitAuto Blog` for 50-60 total). Verify no duplicate slug.
Expand Down Expand Up @@ -172,7 +169,7 @@ assert find_test_files("foo.ts", all_files, None) == ["foo.test.ts"]
- Unsplash API: `source .env && curl "https://api.unsplash.com/search/photos?query=QUERY&orientation=landscape&client_id=$UNSPLASH_ACCESS_KEY"`, download with `?w=1200&h=630&fit=crop&crop=entropy`
- Convert to PNG: `sips -s format png downloaded.jpg --out ../website/public/og/blog/{slug}.png`
- Dev.to crops to 1000x420 — keep important content centered.
13. **Docs page** in `../website/app/docs/`: Create new or update existing. Browse for best-fit category. New pages: 3 files (`page.tsx`, `layout.tsx`, `jsonld.ts`).
10. **Docs page** in `../website/app/docs/`: Create new or update existing. Browse for best-fit category. New pages: 3 files (`page.tsx`, `layout.tsx`, `jsonld.ts`).

## CRITICAL: Fixing Foxquilt PRs

Expand Down
2 changes: 2 additions & 0 deletions constants/claude.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

# https://platform.claude.com/docs/en/docs/about-claude/models/all-models#model-comparison-table
CONTEXT_WINDOW: dict[ClaudeModelId, int] = {
ClaudeModelId.OPUS_4_7: 1_000_000,
ClaudeModelId.OPUS_4_6: 1_000_000,
ClaudeModelId.SONNET_4_6: 1_000_000,
ClaudeModelId.OPUS_4_5: 200_000,
Expand All @@ -15,6 +16,7 @@
}

MAX_OUTPUT_TOKENS: dict[ClaudeModelId, int] = {
ClaudeModelId.OPUS_4_7: 128_000,
ClaudeModelId.OPUS_4_6: 128_000,
ClaudeModelId.SONNET_4_6: 64_000,
ClaudeModelId.OPUS_4_5: 64_000,
Expand Down
14 changes: 11 additions & 3 deletions constants/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
class ClaudeModelId(StrEnum):
"""Claude models — user-selectable and fallback-only."""

OPUS_4_7 = "claude-opus-4-7"
OPUS_4_6 = "claude-opus-4-6"
SONNET_4_6 = "claude-sonnet-4-6"
OPUS_4_5 = "claude-opus-4-5"
Expand Down Expand Up @@ -41,9 +42,9 @@ class ModelInfo(TypedDict):

MODEL_REGISTRY: dict[ModelId, ModelInfo] = {
# Claude (user-selectable)
ClaudeModelId.OPUS_4_6: ModelInfo(
ClaudeModelId.OPUS_4_7: ModelInfo(
provider=ModelProvider.CLAUDE,
display_name="Claude Opus 4.6",
display_name="Claude Opus 4.7",
credit_cost_usd=8,
user_selectable=True,
free_tier=False,
Expand All @@ -56,6 +57,13 @@ class ModelInfo(TypedDict):
free_tier=True,
),
# Claude (fallback-only, same cost as their newer versions)
ClaudeModelId.OPUS_4_6: ModelInfo(
provider=ModelProvider.CLAUDE,
display_name="Claude Opus 4.6",
credit_cost_usd=8,
user_selectable=False,
free_tier=False,
),
ClaudeModelId.OPUS_4_5: ModelInfo(
provider=ModelProvider.CLAUDE,
display_name="Claude Opus 4.5",
Expand Down Expand Up @@ -100,6 +108,6 @@ class ModelInfo(TypedDict):
m for m, r in MODEL_REGISTRY.items() if r["user_selectable"] and r["free_tier"]
]
DEFAULT_FREE_MODEL = GoogleModelId.GEMMA_4_31B
DEFAULT_PAID_MODEL = ClaudeModelId.OPUS_4_6
DEFAULT_PAID_MODEL = ClaudeModelId.OPUS_4_7
MAX_CREDIT_COST_USD = max(entry["credit_cost_usd"] for entry in MODEL_REGISTRY.values())
CREDIT_GRANT_AMOUNT_USD = MAX_CREDIT_COST_USD * 3
16 changes: 16 additions & 0 deletions constants/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,19 @@ def test_anthropic_models_exist():
m for m, r in MODEL_REGISTRY.items() if r["provider"] == ModelProvider.CLAUDE
]
assert len(anthropic_models) >= 1


def test_opus_47_is_user_selectable():
info = MODEL_REGISTRY[ClaudeModelId.OPUS_4_7]
assert info["user_selectable"] is True
assert info["credit_cost_usd"] == 8


def test_opus_46_is_fallback_only():
info = MODEL_REGISTRY[ClaudeModelId.OPUS_4_6]
assert info["user_selectable"] is False
assert info["credit_cost_usd"] == 8


def test_default_paid_model_is_opus_47():
assert DEFAULT_PAID_MODEL == ClaudeModelId.OPUS_4_7
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "GitAuto"
version = "1.6.6"
version = "1.6.10"
requires-python = ">=3.14"
dependencies = [
"annotated-doc==0.0.4",
Expand Down
2 changes: 0 additions & 2 deletions scripts/git/list_changed_files.sh

This file was deleted.

9 changes: 0 additions & 9 deletions services/claude/chat_with_claude.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
from services.claude.remove_outdated_file_edit_attempts import (
remove_outdated_file_edit_attempts,
)
from services.claude.strip_strict_from_tools import strip_strict_from_tools
from services.claude.trim_messages import trim_messages_to_token_limit
from services.llm_result import LlmResult, ToolCall
from services.supabase.llm_requests.insert_llm_request import insert_llm_request
Expand Down Expand Up @@ -43,14 +42,6 @@ def chat_with_claude(
messages=messages, client=claude, model=model_id, max_input=max_input
)

# Strip "strict" from tools for models that don't support it
if model_id not in (
ClaudeModelId.SONNET_4_6,
ClaudeModelId.SONNET_4_5,
ClaudeModelId.OPUS_4_6,
):
tools = strip_strict_from_tools(tools)

# https://docs.anthropic.com/en/api/messages
start_time = time.time()
try:
Expand Down
4 changes: 2 additions & 2 deletions services/claude/evaluate_condition.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ def evaluate_condition(
return EvaluationResult(False, "empty input")

response = claude.beta.messages.create(
model=ClaudeModelId.OPUS_4_6,
max_tokens=MAX_OUTPUT_TOKENS[ClaudeModelId.OPUS_4_6],
model=ClaudeModelId.OPUS_4_7,
max_tokens=MAX_OUTPUT_TOKENS[ClaudeModelId.OPUS_4_7],
temperature=0,
system=system_prompt,
messages=[{"role": "user", "content": content}],
Expand Down
4 changes: 2 additions & 2 deletions services/claude/is_code_untestable.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ def is_code_untestable(
Is this code dead (unreachable/redundant) or genuinely untestable (reachable at runtime but impossible to test)?"""

response = claude.beta.messages.create(
model=ClaudeModelId.OPUS_4_6,
max_tokens=MAX_OUTPUT_TOKENS[ClaudeModelId.OPUS_4_6],
model=ClaudeModelId.OPUS_4_7,
max_tokens=MAX_OUTPUT_TOKENS[ClaudeModelId.OPUS_4_7],
temperature=0,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": content}],
Expand Down
37 changes: 37 additions & 0 deletions services/claude/test_chat_with_claude.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,3 +147,40 @@ def test_chat_with_claude_calls_optimization_functions(
)

mock_remove_outdated_file_edit_attempts.assert_called_once()


@patch("services.claude.chat_with_claude.insert_llm_request")
@patch("services.claude.chat_with_claude.claude")
def test_strict_tools_passed_through_unchanged(mock_claude, mock_insert_llm_request):
"""Strict tools must not be stripped — all current models support strict."""
mock_response = Mock()
mock_response.content = [Mock(type="text", text="ok")]
mock_response.usage = Mock(output_tokens=5)
mock_insert_llm_request.return_value = {"total_cost_usd": 0.01}
mock_claude.messages.create.return_value = mock_response
mock_claude.messages.count_tokens.return_value = Mock(input_tokens=10)

tools = cast(
list[ToolUnionParam],
[
{
"name": "test_tool",
"description": "Test",
"strict": True,
"input_schema": {"type": "object", "properties": {}},
}
],
)

chat_with_claude(
messages=cast(list[MessageParam], [{"role": "user", "content": "test"}]),
system_content="system",
tools=tools,
model_id=ClaudeModelId.HAIKU_4_5,
usage_id=999,
created_by="4:test-user",
)

call_args = mock_claude.messages.create.call_args
passed_tools = call_args.kwargs["tools"]
assert passed_tools[0]["strict"] is True
12 changes: 12 additions & 0 deletions services/claude/test_evaluate_condition.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,18 @@ def test_returns_evaluation_failed_on_invalid_json(self, mock_claude):

assert result == EvaluationResult(False, "evaluation failed")

def test_uses_opus_47_model(self, mock_claude):
mock_response = MagicMock()
mock_response.content = [
MagicMock(text='{"result": true, "reason": "testable"}')
]
mock_claude.beta.messages.create.return_value = mock_response

evaluate_condition(content="code", system_prompt="Check this.")

call_args = mock_claude.beta.messages.create.call_args
assert call_args.kwargs["model"] == "claude-opus-4-7"

def test_uses_structured_output_schema(self, mock_claude):
mock_response = MagicMock()
mock_response.content = [
Expand Down
59 changes: 41 additions & 18 deletions services/claude/test_evaluate_quality_checks.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# pyright: reportArgumentType=false
from unittest.mock import MagicMock, patch

from constants.claude import MAX_OUTPUT_TOKENS
from constants.models import ClaudeModelId
from services.claude.evaluate_quality_checks import evaluate_quality_checks


@patch("services.claude.evaluate_quality_checks.claude")
def test_uses_opus_model(mock_claude):
def _mock_claude_call(mock_claude, model: ClaudeModelId):
mock_content = MagicMock()
mock_content.text = '{"business_logic": {}}'
mock_claude.messages.create.return_value = MagicMock(content=[mock_content])
Expand All @@ -15,26 +15,49 @@ def test_uses_opus_model(mock_claude):
source_content="const x = 1;",
source_path="src/foo.ts",
test_files=[("test/foo.spec.ts", "it('works', () => {})")],
model=ClaudeModelId.OPUS_4_6,
model=model,
)

call_kwargs = mock_claude.messages.create.call_args.kwargs
assert call_kwargs["model"] == "claude-opus-4-6"
return mock_claude.messages.create.call_args.kwargs


@patch("services.claude.evaluate_quality_checks.claude")
def test_uses_max_tokens_matching_model(mock_claude):
mock_content = MagicMock()
mock_content.text = '{"business_logic": {}}'
mock_claude.messages.create.return_value = MagicMock(content=[mock_content])
def test_opus_47_passes_model_and_max_tokens(mock_claude):
kwargs = _mock_claude_call(mock_claude, ClaudeModelId.OPUS_4_7)
assert kwargs["model"] == "claude-opus-4-7"
assert kwargs["max_tokens"] == MAX_OUTPUT_TOKENS[ClaudeModelId.OPUS_4_7]

evaluate_quality_checks(
source_content="const x = 1;",
source_path="src/foo.ts",
test_files=[("test/foo.spec.ts", "it('works', () => {})")],
model=ClaudeModelId.OPUS_4_6,
)

call_kwargs = mock_claude.messages.create.call_args.kwargs
# Opus 4.6 has 128_000 max tokens
assert call_kwargs["max_tokens"] == 128_000
@patch("services.claude.evaluate_quality_checks.claude")
def test_opus_46_passes_model_and_max_tokens(mock_claude):
kwargs = _mock_claude_call(mock_claude, ClaudeModelId.OPUS_4_6)
assert kwargs["model"] == "claude-opus-4-6"
assert kwargs["max_tokens"] == MAX_OUTPUT_TOKENS[ClaudeModelId.OPUS_4_6]


@patch("services.claude.evaluate_quality_checks.claude")
def test_sonnet_46_passes_model_and_max_tokens(mock_claude):
kwargs = _mock_claude_call(mock_claude, ClaudeModelId.SONNET_4_6)
assert kwargs["model"] == "claude-sonnet-4-6"
assert kwargs["max_tokens"] == MAX_OUTPUT_TOKENS[ClaudeModelId.SONNET_4_6]


@patch("services.claude.evaluate_quality_checks.claude")
def test_opus_45_passes_model_and_max_tokens(mock_claude):
kwargs = _mock_claude_call(mock_claude, ClaudeModelId.OPUS_4_5)
assert kwargs["model"] == "claude-opus-4-5"
assert kwargs["max_tokens"] == MAX_OUTPUT_TOKENS[ClaudeModelId.OPUS_4_5]


@patch("services.claude.evaluate_quality_checks.claude")
def test_sonnet_45_passes_model_and_max_tokens(mock_claude):
kwargs = _mock_claude_call(mock_claude, ClaudeModelId.SONNET_4_5)
assert kwargs["model"] == "claude-sonnet-4-5"
assert kwargs["max_tokens"] == MAX_OUTPUT_TOKENS[ClaudeModelId.SONNET_4_5]


@patch("services.claude.evaluate_quality_checks.claude")
def test_haiku_45_passes_model_and_max_tokens(mock_claude):
kwargs = _mock_claude_call(mock_claude, ClaudeModelId.HAIKU_4_5)
assert kwargs["model"] == "claude-haiku-4-5"
assert kwargs["max_tokens"] == MAX_OUTPUT_TOKENS[ClaudeModelId.HAIKU_4_5]
13 changes: 13 additions & 0 deletions services/claude/test_is_code_untestable.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,19 @@ def _set_mock_response(mock_claude, result: bool, category: str, reason: str):
]


def test_uses_opus_47_model(mock_claude):
_set_mock_response(mock_claude, False, "testable", "testable")

is_code_untestable(
file_path="src/app.tsx",
file_content="const x = 1;",
uncovered_lines="1",
)

call_args = mock_claude.beta.messages.create.call_args
assert call_args.kwargs["model"] == "claude-opus-4-7"


def test_returns_testable_when_no_uncovered_code(mock_claude):
result = is_code_untestable(
file_path="src/app.tsx",
Expand Down
1 change: 1 addition & 0 deletions services/get_fallback_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

# Full fallback chains for resilience — includes non-user-selectable models
CLAUDE_FALLBACK_MODELS: list[ModelId] = [
ClaudeModelId.OPUS_4_7,
ClaudeModelId.OPUS_4_6,
ClaudeModelId.OPUS_4_5,
ClaudeModelId.SONNET_4_6,
Expand Down
26 changes: 25 additions & 1 deletion services/supabase/credits/test_get_credit_price.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,33 @@ def test_returns_max_cost_for_none():
assert get_credit_price(None) == MAX_CREDIT_COST_USD


def test_opus_costs_8():
def test_opus_47_costs_8():
assert get_credit_price(ClaudeModelId.OPUS_4_7) == 8


def test_opus_46_costs_8():
assert get_credit_price(ClaudeModelId.OPUS_4_6) == 8


def test_sonnet_46_costs_4():
assert get_credit_price(ClaudeModelId.SONNET_4_6) == 4


def test_opus_45_costs_8():
assert get_credit_price(ClaudeModelId.OPUS_4_5) == 8


def test_sonnet_45_costs_4():
assert get_credit_price(ClaudeModelId.SONNET_4_5) == 4


def test_haiku_45_costs_2():
assert get_credit_price(ClaudeModelId.HAIKU_4_5) == 2


def test_gemini_25_flash_costs_4():
assert get_credit_price(GoogleModelId.GEMINI_2_5_FLASH) == 4


def test_gemma_costs_2():
assert get_credit_price(GoogleModelId.GEMMA_4_31B) == 2
1 change: 1 addition & 0 deletions services/supabase/llm_requests/calculate_costs.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ def calculate_costs(
# Pricing per 1M tokens (input/output)
pricing = {
"claude": {
"claude-opus-4-7": {"input": 5.00, "output": 25.00},
"claude-opus-4-6": {"input": 5.00, "output": 25.00},
"claude-opus-4-5": {"input": 5.00, "output": 25.00},
"claude-sonnet-4-6": {"input": 3.00, "output": 15.00},
Expand Down
Loading