ukeeper · paskal · Mar 29, 2026 · Mar 29, 2026 · Mar 29, 2026 · Mar 29, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -39,6 +39,7 @@ web/           → Go HTML templates (HTMX v2), static assets
 **Key interfaces:**
 - `extractor.Rules` (defined consumer-side in `extractor/readability.go`), implemented by `datastore.RulesDAO`. Mock generated with `//go:generate moq` in extractor package.
 - `extractor.Retriever` (defined in `extractor/retriever.go`) — abstracts URL content fetching. Two implementations: `HTTPRetriever` (default, standard HTTP GET with Safari user-agent) and `CloudflareRetriever` (Cloudflare Browser Rendering API for JS-rendered pages). When `UReadability.Retriever` is nil, defaults to `HTTPRetriever`.
+- `extractor.AIEvaluator` (defined in `extractor/evaluator.go`) — evaluates extraction quality via OpenAI. Implementation: `OpenAIEvaluator`. Mock generated with `//go:generate moq` as test-only mock (`evaluator_mock_test.go`).
 
 ## Content Extraction Flow
 
@@ -48,6 +49,14 @@ web/           → Go HTML templates (HTMX v2), static assets
 4. If rule found → extract via goquery CSS selector; if fails → fall back to general parser
 5. If no rule → use `go-readability` general parser
 6. Normalize relative links to absolute, extract images concurrently (pick largest as lead image)
+7. If `AIEvaluator` is configured and no existing rule for domain (or force mode): evaluate extraction quality via OpenAI, iterate up to `MaxGPTIter` times with suggested CSS selectors, save the best as a new rule
+
+`ExtractAndImprove()` is the force-mode entry point — ignores stored rules, re-extracts with general parser, then evaluates. Used by the `/api/content-parsed-wrong` protected endpoint.
+
+Optional OpenAI flags (when `--openai-api-key` is set, enables auto-evaluation):
+- `--openai-api-key` / `OPENAI_API_KEY` — OpenAI API key
+- `--openai-model` / `OPENAI_MODEL` — model for evaluation (default: `gpt-5.4-mini`)
+- `--openai-max-iter` / `OPENAI_MAX_ITER` — max evaluation iterations (default: `3`)
 
 ## Key Conventions
 

diff --git a/README.md b/README.md
@@ -18,6 +18,9 @@
 | creds        | CREDS           | none           | credentials for protected calls (POST, DELETE /rules) |
 | cf-account-id| CF_ACCOUNT_ID   | none           | Cloudflare account ID for Browser Rendering API       |
 | cf-api-token | CF_API_TOKEN    | none           | Cloudflare API token with Browser Rendering Edit perm |
+| openai-api-key | OPENAI_API_KEY | none          | OpenAI API key; enables auto-evaluation when set      |
+| openai-model | OPENAI_MODEL    | `gpt-5.4-mini` | OpenAI model for evaluation                           |
+| openai-max-iter | OPENAI_MAX_ITER | `3`         | max evaluation iterations per extraction               |
 | dbg          | DEBUG           | `false`        | debug mode                                            |
 
 ### Cloudflare Browser Rendering (optional)
@@ -26,10 +29,23 @@ When both `--cf-account-id` and `--cf-api-token` are set, the service uses Cloud
 
 When these flags are not set, the service uses a standard HTTP client (default).
 
+### OpenAI Auto-Evaluation (optional)
+
+When `--openai-api-key` is set, the service automatically evaluates extraction quality using OpenAI. If the extracted content looks poor (missing article body, too short, mostly boilerplate), GPT suggests a CSS selector targeting the main content. The service iterates up to `--openai-max-iter` times, saving the best selector as a rule for future use.
+
+Evaluation only runs for domains without an existing extraction rule. For domains that already have rules, use the force-mode endpoint to re-evaluate:
+
+    POST /api/content-parsed-wrong?url=http://example.com/article
+
+This protected endpoint (requires basicAuth credentials) ignores the stored rule, re-extracts with the general parser, and runs the evaluation loop to find a better selector.
+
+When OpenAI is not configured, extraction works exactly as before — no GPT calls are made.
+
 ### API
 
     GET /api/content/v1/parser?token=secret&url=http://aa.com/blah - extract content (emulate Readability API parse call)
     POST /api/extract {url: http://aa.com/blah}  - extract content
+    POST /api/content-parsed-wrong?url=http://aa.com/blah - force re-extraction with AI evaluation (requires basicAuth)
 
 ## Development
 

diff --git a/.../plans/20260329-openai-auto-extraction.md → ...pleted/20260329-openai-auto-extraction.md b/.../plans/20260329-openai-auto-extraction.md → ...pleted/20260329-openai-auto-extraction.md
@@ -46,81 +46,81 @@
 - Create: `extractor/evaluator_test.go`
 - Create: `extractor/mocks/evaluator.go` (generated)
 
-- [ ] run `go get github.com/sashabaranov/go-openai@latest && go mod tidy && go mod vendor`
-- [ ] define `AIEvaluator` interface with `Evaluate(ctx, url, extractedText, htmlBody string) (*EvalResult, error)` method
-- [ ] define `EvalResult` struct: `Good bool`, `Selector string`
-- [ ] implement `OpenAIEvaluator` struct with `APIKey`, `Model` fields
-- [ ] implement `Evaluate` method: build prompt with URL + extracted text (first 2000 chars) + truncated HTML body (first 4000 chars), parse JSON response `{"good": true}` or `{"good": false, "selector": "..."}`
-- [ ] handle invalid JSON response: retry once, then return `EvalResult{Good: true}` (fail open)
-- [ ] add `//go:generate moq` directive for `AIEvaluator`, run `go generate` to create mock
-- [ ] write tests: successful good evaluation, successful bad evaluation with selector, invalid JSON response, OpenAI API error
-- [ ] run tests — must pass before next task
+- [x] run `go get github.com/sashabaranov/go-openai@latest && go mod tidy && go mod vendor`
+- [x] define `AIEvaluator` interface with `Evaluate(ctx, url, extractedText, htmlBody string) (*EvalResult, error)` method
+- [x] define `EvalResult` struct: `Good bool`, `Selector string`
+- [x] implement `OpenAIEvaluator` struct with `APIKey`, `Model` fields
+- [x] implement `Evaluate` method: build prompt with URL + extracted text (first 2000 chars) + truncated HTML body (first 4000 chars), parse JSON response `{"good": true}` or `{"good": false, "selector": "..."}`
+- [x] handle invalid JSON response: retry once, then return `EvalResult{Good: true}` (fail open)
+- [x] add `//go:generate moq` directive for `AIEvaluator`, run `go generate` to create mock
+- [x] write tests: successful good evaluation, successful bad evaluation with selector, invalid JSON response, OpenAI API error
+- [x] run tests — must pass before next task
 
 ### Task 2: Wire AIEvaluator into UReadability and add evaluation loop
 
 **Files:**
 - Modify: `extractor/readability.go`
 - Modify: `extractor/readability_test.go`
 
-- [ ] add `AIEvaluator AIEvaluator` and `MaxGPTIter int` fields to `UReadability` struct
-- [ ] change `extractWithRules` signature to `extractWithRules(ctx, reqURL string, rule *datastore.Rule, force bool)`
-- [ ] update callers: `Extract()` passes `force=false`, `ExtractByRule()` passes `force=false`
-- [ ] add `ExtractAndImprove(ctx, url)` public method — calls `extractWithRules(ctx, url, nil, true)`
-- [ ] add `evaluateAndImprove(ctx, reqURL, htmlBody string, result *Response) *Response` private method
-- [ ] implement evaluation loop: up to `MaxGPTIter` iterations (default 3); send URL + result.Content + htmlBody to evaluator; try suggested selector on htmlBody via goquery; feed new extraction back to GPT on next iteration; if GPT says good, break
-- [ ] in `extractWithRules`: after extraction, call `evaluateAndImprove` if: `AIEvaluator != nil` AND (`force` OR no existing rule for domain)
-- [ ] **force mode semantics**: when `force=true`, pass `nil` as rule to `getContent()` so initial extraction uses the general parser (not the stored rule), then let GPT suggest a new selector
-- [ ] if better selector found, save rule via `f.Rules.Save()` with domain and selector
-- [ ] all GPT/evaluation errors logged and swallowed — original result returned unchanged
-- [ ] write tests: extraction with evaluator (good on first try), extraction with evaluator (bad, improved on retry), extraction without evaluator (unchanged behaviour), GPT error (fail open), force mode ignores existing rules and extracts with general parser
-- [ ] run tests — must pass before next task
+- [x] add `AIEvaluator AIEvaluator` and `MaxGPTIter int` fields to `UReadability` struct
+- [x] change `extractWithRules` signature to `extractWithRules(ctx, reqURL string, rule *datastore.Rule, force bool)`
+- [x] update callers: `Extract()` passes `force=false`, `ExtractByRule()` passes `force=false`
+- [x] add `ExtractAndImprove(ctx, url)` public method — calls `extractWithRules(ctx, url, nil, true)`
+- [x] add `evaluateAndImprove(ctx, reqURL, htmlBody string, result *Response) *Response` private method
+- [x] implement evaluation loop: up to `MaxGPTIter` iterations (default 3); send URL + result.Content + htmlBody to evaluator; try suggested selector on htmlBody via goquery; feed new extraction back to GPT on next iteration; if GPT says good, break
+- [x] in `extractWithRules`: after extraction, call `evaluateAndImprove` if: `AIEvaluator != nil` AND (`force` OR no existing rule for domain)
+- [x] **force mode semantics**: when `force=true`, pass `nil` as rule to `getContent()` so initial extraction uses the general parser (not the stored rule), then let GPT suggest a new selector
+- [x] if better selector found, save rule via `f.Rules.Save()` with domain and selector
+- [x] all GPT/evaluation errors logged and swallowed — original result returned unchanged
+- [x] write tests: extraction with evaluator (good on first try), extraction with evaluator (bad, improved on retry), extraction without evaluator (unchanged behaviour), GPT error (fail open), force mode ignores existing rules and extracts with general parser
+- [x] run tests — must pass before next task
 
 ### Task 3: Add CLI flags and wiring in main.go
 
 **Files:**
 - Modify: `main.go`
 
-- [ ] add `OpenAIKey string` field (`--openai-api-key` / `OPENAI_API_KEY`)
-- [ ] add `OpenAIModel string` field (`--openai-model` / `OPENAI_MODEL` default `gpt-5.4-mini`)
-- [ ] add `MaxGPTIter int` field (`--openai-max-iter` / `OPENAI_MAX_ITER` default `3`)
-- [ ] when `OpenAIKey` is set, create `OpenAIEvaluator` and inject into `UReadability`
-- [ ] log which mode is active (with/without OpenAI evaluation)
-- [ ] run tests — must pass before next task
+- [x] add `OpenAIKey string` field (`--openai-api-key` / `OPENAI_API_KEY`)
+- [x] add `OpenAIModel string` field (`--openai-model` / `OPENAI_MODEL` default `gpt-5.4-mini`)
+- [x] add `MaxGPTIter int` field (`--openai-max-iter` / `OPENAI_MAX_ITER` default `3`)
+- [x] when `OpenAIKey` is set, create `OpenAIEvaluator` and inject into `UReadability`
+- [x] log which mode is active (with/without OpenAI evaluation)
+- [x] run tests — must pass before next task
 
 ### Task 4: Add REST endpoint for force mode
 
 **Files:**
 - Modify: `rest/server.go`
 - Modify: `rest/server_test.go`
 
-- [ ] add `GET /content-parsed-wrong` route in the protected group within `api.Mount("/api")` (full path: `/api/content-parsed-wrong`, requires basicAuth)
-- [ ] implement `contentParsedWrong` handler: validate `url` query param, check `AIEvaluator` is configured, call `s.Readability.ExtractAndImprove()`, return JSON result
-- [ ] write tests: successful call, missing url param, missing OpenAI config (AIEvaluator nil)
-- [ ] run tests — must pass before next task
+- [x] add `GET /content-parsed-wrong` route in the protected group within `api.Mount("/api")` (full path: `/api/content-parsed-wrong`, requires basicAuth)
+- [x] implement `contentParsedWrong` handler: validate `url` query param, check `AIEvaluator` is configured, call `s.Readability.ExtractAndImprove()`, return JSON result
+- [x] write tests: successful call, missing url param, missing OpenAI config (AIEvaluator nil)
+- [x] run tests — must pass before next task
 
 ### Task 5: Run linter and final checks
 
-- [ ] run `gofmt -w` on all modified files
-- [ ] run `go fix ./...`
-- [ ] run `golangci-lint run --max-issues-per-linter=0 --max-same-issues=0`
-- [ ] fix any lint issues
-- [ ] run tests — must pass before next task
+- [x] run `gofmt -w` on all modified files
+- [x] run `go fix ./...`
+- [x] run `golangci-lint run --max-issues-per-linter=0 --max-same-issues=0`
+- [x] fix any lint issues
+- [x] run tests — must pass before next task
 
 ### Task 6: Verify acceptance criteria
 
-- [ ] verify `Extract()` without OpenAI configured works exactly as before (existing tests pass)
-- [ ] verify `Extract()` with OpenAI configured evaluates and improves extraction (test with mock evaluator)
-- [ ] verify `Extract()` skips evaluation when domain already has a rule (test with mock rules returning a rule)
-- [ ] verify `ExtractAndImprove()` runs evaluation even when rule exists, using general parser for initial extraction
-- [ ] verify GPT errors don't break extraction (test with evaluator returning error)
-- [ ] verify rule is saved when better selector found (test with mock rules verifying Save call)
-- [ ] run full test suite: `go test -timeout=60s -race ./...`
+- [x] verify `Extract()` without OpenAI configured works exactly as before (existing tests pass)
+- [x] verify `Extract()` with OpenAI configured evaluates and improves extraction (test with mock evaluator)
+- [x] verify `Extract()` skips evaluation when domain already has a rule (test with mock rules returning a rule)
+- [x] verify `ExtractAndImprove()` runs evaluation even when rule exists, using general parser for initial extraction
+- [x] verify GPT errors don't break extraction (test with evaluator returning error)
+- [x] verify rule is saved when better selector found (test with mock rules verifying Save call)
+- [x] run full test suite: `go test -timeout=60s -race ./...`
 
 ### Task 7: [Final] Update documentation
 
-- [ ] update README.md with OpenAI configuration flags
-- [ ] update CLAUDE.md with new AIEvaluator interface and extraction flow
-- [ ] move this plan to `docs/plans/completed/`
+- [x] update README.md with OpenAI configuration flags
+- [x] update CLAUDE.md with new AIEvaluator interface and extraction flow
+- [x] move this plan to `docs/plans/completed/`
 
 ## Technical Details
 

diff --git a/extractor/evaluator.go b/extractor/evaluator.go
@@ -0,0 +1,150 @@
+package extractor
+
+import (
+	"context"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"strings"
+	"sync"
+	"time"
+
+	log "github.com/go-pkgz/lgr"
+	openai "github.com/sashabaranov/go-openai"
+)
+
+//go:generate moq -out evaluator_mock_test.go -skip-ensure -fmt goimports . AIEvaluator
+
+// AIEvaluator evaluates extraction quality and suggests CSS selectors for improvement
+type AIEvaluator interface {
+	Evaluate(ctx context.Context, url, extractedText, htmlBody, prevSelector string) (*EvalResult, error)
+}
+
+// EvalResult holds the evaluation outcome from the AI model
+type EvalResult struct {
+	Good     bool   // true if extraction looks fine
+	Selector string // suggested CSS selector (only when Good=false)
+}
+
+const (
+	maxExtractedTextLen = 2000
+	maxHTMLBodyLen      = 4000
+	openaiCallTimeout   = 60 * time.Second
+)
+
+var errInvalidJSON = errors.New("invalid JSON response from OpenAI")
+
+const systemPrompt = `You are a web content extraction expert. You evaluate whether extracted article text is complete and correct, and suggest CSS selectors when extraction is poor.`
+
+// OpenAIEvaluator uses OpenAI API to evaluate extraction quality
+type OpenAIEvaluator struct {
+	APIKey       string
+	Model        string
+	clientConfig *openai.ClientConfig // optional, for testing
+	clientOnce   sync.Once
+	client       *openai.Client
+}
+
+func (e *OpenAIEvaluator) getClient() *openai.Client {
+	e.clientOnce.Do(func() {
+		if e.clientConfig != nil {
+			e.client = openai.NewClientWithConfig(*e.clientConfig)
+		} else {
+			e.client = openai.NewClient(e.APIKey)
+		}
+	})
+	return e.client
+}
+
+// Evaluate sends the extracted text and HTML body to OpenAI for evaluation.
+// Returns EvalResult indicating whether extraction is good, or suggests a CSS selector.
+func (e *OpenAIEvaluator) Evaluate(ctx context.Context, reqURL, extractedText, htmlBody, prevSelector string) (*EvalResult, error) {
+	client := e.getClient()
+	userPrompt := buildUserPrompt(reqURL, extractedText, htmlBody, prevSelector)
+
+	callCtx, cancel := context.WithTimeout(ctx, openaiCallTimeout)
+	defer cancel()
+
+	result, err := e.callAPI(callCtx, client, userPrompt)
+	if err != nil {
+		if !errors.Is(err, errInvalidJSON) {
+			return nil, err
+		}
+
+		// retry once on invalid JSON with a fresh timeout
+		log.Printf("[WARN] invalid JSON from OpenAI for %s, retrying once", reqURL)
+		retryCtx, retryCancel := context.WithTimeout(ctx, openaiCallTimeout)
+		defer retryCancel()
+		result, err = e.callAPI(retryCtx, client, userPrompt)
+		if err != nil {
+			return nil, fmt.Errorf("openai retry for %s: %w", reqURL, err)
+		}
+	}
+
+	return result, nil
+}
+
+// callAPI makes a single API call and parses the response JSON.
+// returns errInvalidJSON if the response is not valid JSON.
+func (e *OpenAIEvaluator) callAPI(ctx context.Context, client *openai.Client, userPrompt string) (*EvalResult, error) {
+	resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
+		Model: e.Model,
+		Messages: []openai.ChatCompletionMessage{
+			{Role: openai.ChatMessageRoleSystem, Content: systemPrompt},
+			{Role: openai.ChatMessageRoleUser, Content: userPrompt},
+		},
+		Temperature: 0,
+	})
+	if err != nil {
+		return nil, fmt.Errorf("openai API error: %w", err)
+	}
+
+	if len(resp.Choices) == 0 {
+		return nil, errors.New("openai returned no choices")
+	}
+
+	content := strings.TrimSpace(resp.Choices[0].Message.Content)
+	return parseEvalResponse(content)
+}
+
+// parseEvalResponse parses the JSON response from the model.
+// Returns errInvalidJSON if JSON is invalid.
+func parseEvalResponse(content string) (*EvalResult, error) {
+	var raw struct {
+		Good     bool   `json:"good"`
+		Selector string `json:"selector"`
+	}
+	if err := json.Unmarshal([]byte(content), &raw); err != nil {
+		return nil, errInvalidJSON
+	}
+
+	return &EvalResult{Good: raw.Good, Selector: raw.Selector}, nil
+}
+
+func buildUserPrompt(reqURL, extractedText, htmlBody, prevSelector string) string {
+	if runes := []rune(extractedText); len(runes) > maxExtractedTextLen {
+		extractedText = string(runes[:maxExtractedTextLen])
+	}
+	if runes := []rune(htmlBody); len(runes) > maxHTMLBodyLen {
+		htmlBody = string(runes[:maxHTMLBodyLen])
+	}
+
+	var sb strings.Builder
+	_, _ = fmt.Fprintf(&sb, "I extracted content from this URL: %s\n\n", reqURL)
+	_, _ = fmt.Fprintf(&sb, "Extracted text (first 2000 chars):\n---\n%s\n---\n\n", extractedText)
+	_, _ = fmt.Fprintf(&sb, "Page HTML structure (first 4000 chars):\n---\n%s\n---\n\n", htmlBody)
+	_, _ = fmt.Fprint(&sb, `Is this a good extraction of the article content? Consider:
+- Does it contain the main article body (not just navigation/ads/boilerplate)?
+- Is it reasonably complete (not truncated or empty)?
+
+Respond in JSON only, no other text:
+{"good": true} if extraction is fine
+{"good": false, "selector": "article.post-content"} if not, with a CSS selector that targets the main content on this page`)
+
+	if prevSelector != "" {
+		_, _ = fmt.Fprintf(&sb, "\n\nPrevious attempt with selector %q was tried but didn't improve. "+
+			"Suggest a different selector based on the HTML structure above.", prevSelector)
+	}
+
+	return sb.String()
+}