Initial implementation: P1-P4 + audit fixes by SRjoeee · Pull Request #1 · Findit-AI/findit-keyframe

SRjoeee · 2026-04-16T16:13:31Z

First end-to-end implementation of findit-keyframe, the Python reference for per-shot keyframe extraction. The Rust port (out-of-scope for this PR, owned by a teammate) consumes the same JSON fixtures and matches the bit-precision contract recorded here.

What ships

Phase	Task	Modules
P1	Scaffold + types + docs skeleton	`pyproject.toml`, CI, `src/findit_keyframe/types.py`, `docs/algorithm.md`, `docs/rust-porting.md`
P2	Quality metrics + decoder + sampler	`quality.py`, `decoder.py`, `sampler.py`
P3	CLI + benchmark	`cli.py`, `benchmarks/bench_e2e.py`
P4	Apple Vision saliency + doc finalisation	`saliency.py`, refreshed docs
—	Review fixes (narrowed exception, walrus removal, full Args/Returns/Raises)	across modules
—	Audit cosmetics (info lines to stderr, examples script, tightened doc reference)	`cli.py`, `examples/extract_basic.py`, `docs/algorithm.md`

Verification

147 tests pass, 1 skipped (off-Darwin guard test on macOS).
ruff check, ruff format --check, mypy --strict all clean on src/findit_keyframe.
macOS CI matrix exercises the Apple Vision saliency path; Linux CI exercises everything else.

Algorithmic contracts the Rust port replays verbatim

tests/fixtures/quality/canonical.json — bit-precision golden fixture for compute_quality (10 deterministic frames, 6-decimal tolerance).
tests/fixtures/regression/extract_all_synthetic.json — full extract_all snapshot using an in-memory deterministic decoder.
docs/rust-porting.md — full Type Map, dependency map, idiom map.

What's deliberately deferred

Per TASKS.md §7 (P2+ in the Rust phase): MMR-based cross-bin deduplication, SigLIP medoid selection, cross-shot coherence, learned quality models, hardware decode. The decoder already exposes pick_strategy for the dense-shot Sequential path, but only PerShotSeek is implemented today.

How to review

docs/algorithm.md for the algorithm spec
docs/rust-porting.md for the cross-language contract
src/findit_keyframe/sampler.py for the algorithmic heart
tests/test_quality_golden.py and tests/test_sampler_regression.py for the snapshot contracts

* T1 — repo structure: pyproject.toml (hatchling, mypy strict, ruff, pytest), dual Apache-2.0/MIT license, GitHub Actions CI for ruff + mypy + pytest on Linux/macOS for Python 3.11/3.12, .gitignore. * T2 — src/findit_keyframe/types.py: Timebase, Timestamp, ShotRange (semantic equality and ordering via cross-multiply, mirrors scenesdetect's Rust types), Confidence (StrEnum), SamplingConfig (mutable), QualityMetrics (frozen), ExtractedKeyframe (mutable). 31 tests covering the invariants from TASKS.md §2. * T9 skeleton — docs/algorithm.md (algorithm spec + JSON schemas) and docs/rust-porting.md (type/dependency/idiom maps for the Rust port). * Module stubs for decoder, quality, sampler, saliency, cli to be filled in P2-P4.

* T4 (quality.py): rgb_to_luma (BT.601 fixed-point, bit-exact with scenesdetect), laplacian_variance (3x3 kernel, no scipy/cv2), mean_luma, luma_variance (sample, ddof=1), entropy (Shannon over 256-bin histogram), QualityGate, compute_quality. 39 tests. * T3 (decoder.py): VideoDecoder context-manager around PyAV with decode_at (keyframe seek + forward decode to target PTS) and decode_sequential (single linear pass over a shot list, sorts shots internally so original ids survive). DecodedFrame stores width/height alongside the ndarray for 1:1 Rust port. Strategy enum + pick_strategy density heuristic (>0.3 shots/s or >200 shots -> Sequential). 18 tests. * T5 basic (sampler.py): compute_bins (boundary-shrunken equal partition, ceil(D/I) clamped to max), _candidate_times (cell centres), _ordinal_rank, score_bin_candidates (0.6*lap-rank + 0.2*ent-rank + 0.2*saliency), select_from_bin, fallback (expand into neighbours -> Low; force-pick -> Degraded), extract_for_shot, extract_all. 29 tests. * tests/conftest.py: session-scoped tiny_video (uniform ramp, exercises gate-fail / Degraded path) and varied_video (mid-tone noise, exercises happy path) fixtures. Both encoded once per session via libx264. * Re-export VideoDecoder, QualityGate, extract_all, etc. from package root so the README quickstart works as written. Suite: 117 passed, ruff + ruff format + mypy strict all clean.

* T7 (cli.py): findit-keyframe extract VIDEO SHOTS_JSON OUTPUT_DIR. Reads scenesdetect-compatible shot JSON, optional --config for SamplingConfig overrides, --saliency {none,apple} flag (apple stubs out until T6 lands in P4). Per-keyframe baseline JPEG via PyAV's mjpeg + image2 muxer (no Pillow dep), manifest.json with quality dict and confidence string. argparse subclass routes parse errors through exit code 1 instead of argparse's default 2 so the documented exit-code semantics hold: 0 success / 1 input error / 2 extraction error. 16 tests cover JSON parsing, --help, end-to-end JPEG round-trip via PyAV, manifest schema, filename pattern, and all three exit codes. * T8 (benchmarks/bench_e2e.py): standalone CLI script for end-to-end perf baselines. Synthesises uniform 4-second shots if --shots is omitted, accepts a real shot JSON otherwise. Times extract_all, normalises peak RSS across Linux/macOS, appends a row to benchmarks/results.md (auto-creates with header on first run) tagged with git SHA and UTC timestamp. README documents usage and the P3 budget (Kino Demo 1m44s must finish in <30 s on M-series). 2 subprocess smoke tests. Suite: 135 passed, ruff + ruff format + mypy strict all clean.

* T6 (saliency.py): SaliencyProvider runtime-checkable Protocol, NoopSaliencyProvider (always 0.0), AppleVisionSaliencyProvider wrapping VNGenerateAttentionBasedSaliencyImageRequest, and default_saliency_provider() factory. Vision/Quartz are imported lazily inside __init__ so the surrounding module is importable on Linux without pyobjc-framework-Vision present. The Apple provider derives its scalar from salientObjects bounding boxes — sum(area * confidence) clamped to [0, 1] — rather than reading the saliency heatmap CVPixelBuffer. The bounding-box scalar carries the same "is anything attention- grabbing" signal with a much cleaner pyobjc API; the Rust port (objc2-vision) can choose either path. 8 tests: NoopSaliencyProvider, default_saliency_provider on Linux (mocked) and macOS, AppleVisionSaliencyProvider non-Darwin guard, plus three macOS-only tests that exercise the actual Vision request on a centre-white test frame. * Wiring: sampler's select_from_bin, _select_with_fallback, extract_for_shot, and extract_all all accept an optional saliency_provider; the score formula's saliency_mass term is populated only when a provider is configured. _compute_metrics helper avoids the if/else duplication between the native-bin selector and the force-pick fallback. cli's --saliency apple now wires AppleVisionSaliencyProvider end-to-end (was a stub returning input-error in P3); RuntimeError on instantiation (e.g. running on Linux) is mapped to exit code 1. * T9 finalise (docs/): algorithm.md now reflects shipped behaviour exactly (cell-centred sampling, ordinal-rank scoring, three-tier fallback, saliency provider contract, parameter rationale table including gate thresholds). rust-porting.md is fleshed out with every shipping type (DecodedFrame, Strategy, QualityGate, all three SaliencyProvider impls), the actual numpy ops we use (BT.601 fixed-point, Laplacian via slicing, ordinal rank via argsort), and an Apple Vision idiom map for objc2-vision. * __init__.py exports SaliencyProvider, NoopSaliencyProvider, AppleVisionSaliencyProvider, default_saliency_provider so the README quickstart works as written. Suite: 143 passed, 1 skipped (off-Darwin guard test on macOS), ruff + ruff format + mypy strict all clean.

…ises docstrings * Blocker-lite (cli.py): narrow `except Exception` to (av.error.FFmpegError, OSError, ValueError, RuntimeError) so programmer bugs (TypeError, AttributeError, ...) propagate as crashes instead of being mapped to the extraction-error exit code. Verified PyAV 13's exception hierarchy: FFmpegError is the base for both file/format errors (subclass of OSError) and decode errors (subclass of ValueError); explicit listing of OSError/ValueError keeps the contract clear and catches the ones not routed through PyAV's hierarchy (e.g. our own raise). * Major (sampler.py): remove the two walrus assignments in _select_with_fallback and rebind to explicit variables; this was a Rust-translation trap (G2 in TASKS.md). Delete the dead `_ = pick_strategy(shots, decoder.duration_sec)` call from extract_all and rewrite its docstring to match shipped behaviour (the call did nothing; it was leftover from when I thought we'd implement the Sequential branch in P3). * Minor (TASKS.md §5): add full Args/Returns/Raises docstrings to every public function across decoder, quality, sampler, saliency, and cli modules. VideoDecoder.open had ZERO docstring before (worst offender). Touched ~20 functions in total. No behaviour changes. * Side effect: with the pick_strategy import gone, sampler.py's VideoDecoder import is now annotation-only -> moved to the TYPE_CHECKING block (ruff TC001). Suite: 143 passed, 1 skipped, ruff + ruff format + mypy strict all clean.

* tests/fixtures/quality/canonical.json — 10 deterministic frames generated by integer-arithmetic kinds (solid / channel / h_gradient / v_gradient / checker / single_pixel), each with expected QualityMetrics rounded to 6 decimal places. Generators use no PRNG so the Rust port can reproduce them byte-for-byte. * tests/test_quality_golden.py — single test asserts every field of every frame matches expected to 1e-6. The same file carries the regenerator (run via `python tests/test_quality_golden.py`) and the generator semantics in `_build_frame`'s docstring, which is the authoritative spec the Rust port consumes. * quality.py: short-circuit `entropy()` for delta distributions to return clean +0.0 instead of -0.0 (the naive `-sum([p*log2(p)]) where p == 1.0` evaluates to -0.0 in IEEE-754). Tidies the JSON snapshot; `0.0 == -0.0` so no existing test changes. * docs/rust-porting.md: register the new fixture in the test fixture map with explicit instructions for Rust replay. Suite: 144 passed, 1 skipped, ruff + ruff format + mypy strict all clean.

…sion snapshot * tests/conftest.py: add quality_gradient_video fixture — a 20-second 15-fps 96x96 sharp/blur/sharp video. Frames in the middle third are the same noise as the outer thirds smoothed by a 5x5 box filter, so they survive libx264 encoding with measurably lower Laplacian variance. Bundles a small numpy-only _box_blur helper so the fixture has no scipy dependency. * tests/test_sampler.py::TestQualityGradient — TASKS.md T5 verification ("verify selected frames come from sharp regions in each bucket"). Two cases: 1. Asserts that the sampler picks sharp-region timestamps in the two mixed bins (1 and 3) and stays inside the all-sharp / all-blur regions for bins 0, 2, 4. 2. Corroborates the encoder/decoder pipeline by asserting the all-blur bin's Laplacian variance is strictly below both all-sharp bins'. * tests/test_sampler_regression.py — TASKS.md T5 regression fixture (a Kino-Demo stand-in until the real asset is available). Uses an in-memory _FakeDecoder that returns deterministic noise frames keyed by frame index; the sampler's only contact with the decoder is via duck-typed duration_sec/timebase/decode_at, so the snapshot is bit-exact across machines and PyAV/FFmpeg versions. Snapshots every (shot_id, bucket, timestamp_sec, confidence, quality) tuple to tests/fixtures/regression/extract_all_synthetic.json. Carries its own regenerator (run via `python tests/test_sampler_regression.py`). * .gitignore: ignore Claude session metadata under .claude/. Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean.

* docs/algorithm.md §8: previously named only the MMR-based cross-bin deduplication path under TASKS.md §7 and tagged it "P3+", which mismatches the actual section. §7 ("Out-of-Scope but Noted for Later") tracks five deferred items (MMR + SigLIP medoid + cross-shot coherence + learned quality models + hardware decode), all targeted at P2+ in the Rust phase. Updated wording to match. * cli.py _extract_command: move the two info lines ("wrote N keyframes" and "manifest: ...") from stdout to stderr. Status output should never compete with the program's machine-readable channel; the manifest path is also discoverable on disk at args.output/manifest.json so callers don't depend on parsing it from stdout. No tests asserted on the stdout content, so no test changes needed. * examples/extract_basic.py: real example replacing the .gitkeep placeholder. Shows the full programmatic API (VideoDecoder open + default_saliency_provider + extract_all) wired against any video file passed on the command line. Smoke-tested end-to-end on a synthetic 60-frame video; correctly picks up Apple Vision saliency on macOS and reports per-keyframe bin / timestamp / confidence / laplacian / saliency. Audit items not actioned (deliberately): * _parse_config_json shared default: false alarm — SamplingConfig() returns a fresh instance each call; no class-level mutation. * Dead pick_strategy: already removed in 891a65b. * scenesdetect linkage in rust-porting.md: 6 references already present; reviewer missed them. Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean.

…yAV pin Three issues surfaced by the automated review loop (all scored 75, below the PR-comment threshold but verified real). Fixing them here so the Rust-port contract is unambiguous and the dependency pin matches the exception hierarchy the CLI relies on. * types.py QualityMetrics.saliency_mass docstring rewrite: the field was described as the "sum of an Apple Vision attention heatmap" — misleading, since saliency.py explicitly rejects the CVPixelBuffer heatmap path in favour of `clamp(sum(area * confidence), 0, 1)` over the request's salientObjects bounding boxes. Docstring now names the actual computation and points to saliency.py for the rationale so a Rust porter gets the same contract. * cli.py module docstring: "Exit codes (per TASKS.md §7)" pointed readers at the "Out-of-Scope but Noted for Later" section. Corrected to §3 → Task 7 (where the exit-code spec actually lives, inside the Task Breakdown). * types.py SamplingConfig docstring: "Defaults are documented in docs/algorithm.md §6" pointed readers at the JSON Schemas section. Corrected to §7 "Parameter Rationale" (where the defaults table actually lives). * pyproject.toml: tighten `av>=12.0,<14.0` to `av>=13.0,<14.0`. The "Review fixes" commit narrowed the CLI except clause based on PyAV 13's exception hierarchy (FFmpegError inheriting from both OSError and ValueError); PyAV 12's hierarchy differs and would silently change which exceptions get caught. TASKS.md §6 Risk Register entry updated to match and explain the coupling. Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean, PyAV 13.1.0 installed.

SRjoeee added 9 commits April 17, 2026 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation: P1-P4 + audit fixes#1

Initial implementation: P1-P4 + audit fixes#1
SRjoeee wants to merge 9 commits intomainfrom
feature/initial-implementation

SRjoeee commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SRjoeee commented Apr 16, 2026

What ships

Verification

Algorithmic contracts the Rust port replays verbatim

What's deliberately deferred

How to review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant