Add case_coverage category to quality checklist#2557
Merged
hiroshinishio merged 1 commit intomainfrom Apr 20, 2026
Merged
Conversation
New checks: dimension_enumeration, combinatorial_matrix, explicit_expected_per_cell. Verified end-to-end with Gemma integration tests — grading discriminates a 1-case test from a parametrized matrix on the same source.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Added a new
case_coveragecategory to the quality checklist with three checks:dimension_enumeration— tests identify the function's independent input dimensionscombinatorial_matrix— all meaningful combinations are covered, not just happy pathexplicit_expected_per_cell— each case asserts an exact expected resultThe checklist is JSON-serialized into the grader's system prompt, so the new category is picked up automatically across all callers. The checklist hash changes, invalidating cached grading results for previously evaluated files.
Verified end-to-end with two Gemma integration tests: one confirms the category reaches the model and is graded with valid statuses; the second contrasts a 1-case test against a parametrized matrix on the same source and asserts Gemma marks
combinatorial_matrixasfailfor the weak test while the strong test fails fewer checks overall. Grading is discriminative, not rubber-stamped.Social Media Post (GitAuto)
Quality gate now grades tests on case-matrix completeness
Social Media Post (Wes)
Was reviewing a new function this morning and noticed the tests covered one happy path for a function with three independent input dimensions. The existing quality gate passed it. Added a category that grades whether tests enumerate the full matrix. Ran it on a contrived weak-vs-strong pair to prove the grader actually discriminates. Now thin tests fail the gate instead of sliding through.