Open
Conversation
* T1 — repo structure: pyproject.toml (hatchling, mypy strict, ruff, pytest), dual Apache-2.0/MIT license, GitHub Actions CI for ruff + mypy + pytest on Linux/macOS for Python 3.11/3.12, .gitignore. * T2 — src/findit_keyframe/types.py: Timebase, Timestamp, ShotRange (semantic equality and ordering via cross-multiply, mirrors scenesdetect's Rust types), Confidence (StrEnum), SamplingConfig (mutable), QualityMetrics (frozen), ExtractedKeyframe (mutable). 31 tests covering the invariants from TASKS.md §2. * T9 skeleton — docs/algorithm.md (algorithm spec + JSON schemas) and docs/rust-porting.md (type/dependency/idiom maps for the Rust port). * Module stubs for decoder, quality, sampler, saliency, cli to be filled in P2-P4.
* T4 (quality.py): rgb_to_luma (BT.601 fixed-point, bit-exact with scenesdetect), laplacian_variance (3x3 kernel, no scipy/cv2), mean_luma, luma_variance (sample, ddof=1), entropy (Shannon over 256-bin histogram), QualityGate, compute_quality. 39 tests. * T3 (decoder.py): VideoDecoder context-manager around PyAV with decode_at (keyframe seek + forward decode to target PTS) and decode_sequential (single linear pass over a shot list, sorts shots internally so original ids survive). DecodedFrame stores width/height alongside the ndarray for 1:1 Rust port. Strategy enum + pick_strategy density heuristic (>0.3 shots/s or >200 shots -> Sequential). 18 tests. * T5 basic (sampler.py): compute_bins (boundary-shrunken equal partition, ceil(D/I) clamped to max), _candidate_times (cell centres), _ordinal_rank, score_bin_candidates (0.6*lap-rank + 0.2*ent-rank + 0.2*saliency), select_from_bin, fallback (expand into neighbours -> Low; force-pick -> Degraded), extract_for_shot, extract_all. 29 tests. * tests/conftest.py: session-scoped tiny_video (uniform ramp, exercises gate-fail / Degraded path) and varied_video (mid-tone noise, exercises happy path) fixtures. Both encoded once per session via libx264. * Re-export VideoDecoder, QualityGate, extract_all, etc. from package root so the README quickstart works as written. Suite: 117 passed, ruff + ruff format + mypy strict all clean.
* T7 (cli.py): findit-keyframe extract VIDEO SHOTS_JSON OUTPUT_DIR.
Reads scenesdetect-compatible shot JSON, optional --config for
SamplingConfig overrides, --saliency {none,apple} flag (apple
stubs out until T6 lands in P4). Per-keyframe baseline JPEG via
PyAV's mjpeg + image2 muxer (no Pillow dep), manifest.json with
quality dict and confidence string. argparse subclass routes
parse errors through exit code 1 instead of argparse's default
2 so the documented exit-code semantics hold:
0 success / 1 input error / 2 extraction error.
16 tests cover JSON parsing, --help, end-to-end JPEG round-trip
via PyAV, manifest schema, filename pattern, and all three exit
codes.
* T8 (benchmarks/bench_e2e.py): standalone CLI script for
end-to-end perf baselines. Synthesises uniform 4-second shots
if --shots is omitted, accepts a real shot JSON otherwise.
Times extract_all, normalises peak RSS across Linux/macOS,
appends a row to benchmarks/results.md (auto-creates with
header on first run) tagged with git SHA and UTC timestamp.
README documents usage and the P3 budget (Kino Demo 1m44s
must finish in <30 s on M-series). 2 subprocess smoke tests.
Suite: 135 passed, ruff + ruff format + mypy strict all clean.
* T6 (saliency.py): SaliencyProvider runtime-checkable Protocol, NoopSaliencyProvider (always 0.0), AppleVisionSaliencyProvider wrapping VNGenerateAttentionBasedSaliencyImageRequest, and default_saliency_provider() factory. Vision/Quartz are imported lazily inside __init__ so the surrounding module is importable on Linux without pyobjc-framework-Vision present. The Apple provider derives its scalar from salientObjects bounding boxes — sum(area * confidence) clamped to [0, 1] — rather than reading the saliency heatmap CVPixelBuffer. The bounding-box scalar carries the same "is anything attention- grabbing" signal with a much cleaner pyobjc API; the Rust port (objc2-vision) can choose either path. 8 tests: NoopSaliencyProvider, default_saliency_provider on Linux (mocked) and macOS, AppleVisionSaliencyProvider non-Darwin guard, plus three macOS-only tests that exercise the actual Vision request on a centre-white test frame. * Wiring: sampler's select_from_bin, _select_with_fallback, extract_for_shot, and extract_all all accept an optional saliency_provider; the score formula's saliency_mass term is populated only when a provider is configured. _compute_metrics helper avoids the if/else duplication between the native-bin selector and the force-pick fallback. cli's --saliency apple now wires AppleVisionSaliencyProvider end-to-end (was a stub returning input-error in P3); RuntimeError on instantiation (e.g. running on Linux) is mapped to exit code 1. * T9 finalise (docs/): algorithm.md now reflects shipped behaviour exactly (cell-centred sampling, ordinal-rank scoring, three-tier fallback, saliency provider contract, parameter rationale table including gate thresholds). rust-porting.md is fleshed out with every shipping type (DecodedFrame, Strategy, QualityGate, all three SaliencyProvider impls), the actual numpy ops we use (BT.601 fixed-point, Laplacian via slicing, ordinal rank via argsort), and an Apple Vision idiom map for objc2-vision. * __init__.py exports SaliencyProvider, NoopSaliencyProvider, AppleVisionSaliencyProvider, default_saliency_provider so the README quickstart works as written. Suite: 143 passed, 1 skipped (off-Darwin guard test on macOS), ruff + ruff format + mypy strict all clean.
…ises docstrings * Blocker-lite (cli.py): narrow `except Exception` to (av.error.FFmpegError, OSError, ValueError, RuntimeError) so programmer bugs (TypeError, AttributeError, ...) propagate as crashes instead of being mapped to the extraction-error exit code. Verified PyAV 13's exception hierarchy: FFmpegError is the base for both file/format errors (subclass of OSError) and decode errors (subclass of ValueError); explicit listing of OSError/ValueError keeps the contract clear and catches the ones not routed through PyAV's hierarchy (e.g. our own raise). * Major (sampler.py): remove the two walrus assignments in _select_with_fallback and rebind to explicit variables; this was a Rust-translation trap (G2 in TASKS.md). Delete the dead `_ = pick_strategy(shots, decoder.duration_sec)` call from extract_all and rewrite its docstring to match shipped behaviour (the call did nothing; it was leftover from when I thought we'd implement the Sequential branch in P3). * Minor (TASKS.md §5): add full Args/Returns/Raises docstrings to every public function across decoder, quality, sampler, saliency, and cli modules. VideoDecoder.open had ZERO docstring before (worst offender). Touched ~20 functions in total. No behaviour changes. * Side effect: with the pick_strategy import gone, sampler.py's VideoDecoder import is now annotation-only -> moved to the TYPE_CHECKING block (ruff TC001). Suite: 143 passed, 1 skipped, ruff + ruff format + mypy strict all clean.
* tests/fixtures/quality/canonical.json — 10 deterministic frames generated by integer-arithmetic kinds (solid / channel / h_gradient / v_gradient / checker / single_pixel), each with expected QualityMetrics rounded to 6 decimal places. Generators use no PRNG so the Rust port can reproduce them byte-for-byte. * tests/test_quality_golden.py — single test asserts every field of every frame matches expected to 1e-6. The same file carries the regenerator (run via `python tests/test_quality_golden.py`) and the generator semantics in `_build_frame`'s docstring, which is the authoritative spec the Rust port consumes. * quality.py: short-circuit `entropy()` for delta distributions to return clean +0.0 instead of -0.0 (the naive `-sum([p*log2(p)]) where p == 1.0` evaluates to -0.0 in IEEE-754). Tidies the JSON snapshot; `0.0 == -0.0` so no existing test changes. * docs/rust-porting.md: register the new fixture in the test fixture map with explicit instructions for Rust replay. Suite: 144 passed, 1 skipped, ruff + ruff format + mypy strict all clean.
…sion snapshot
* tests/conftest.py: add quality_gradient_video fixture — a 20-second
15-fps 96x96 sharp/blur/sharp video. Frames in the middle third are
the same noise as the outer thirds smoothed by a 5x5 box filter, so
they survive libx264 encoding with measurably lower Laplacian
variance. Bundles a small numpy-only _box_blur helper so the fixture
has no scipy dependency.
* tests/test_sampler.py::TestQualityGradient — TASKS.md T5
verification ("verify selected frames come from sharp regions in
each bucket"). Two cases:
1. Asserts that the sampler picks sharp-region timestamps in the
two mixed bins (1 and 3) and stays inside the all-sharp /
all-blur regions for bins 0, 2, 4.
2. Corroborates the encoder/decoder pipeline by asserting the
all-blur bin's Laplacian variance is strictly below both
all-sharp bins'.
* tests/test_sampler_regression.py — TASKS.md T5 regression fixture
(a Kino-Demo stand-in until the real asset is available). Uses an
in-memory _FakeDecoder that returns deterministic noise frames
keyed by frame index; the sampler's only contact with the decoder
is via duck-typed duration_sec/timebase/decode_at, so the snapshot
is bit-exact across machines and PyAV/FFmpeg versions. Snapshots
every (shot_id, bucket, timestamp_sec, confidence, quality) tuple
to tests/fixtures/regression/extract_all_synthetic.json. Carries
its own regenerator (run via `python tests/test_sampler_regression.py`).
* .gitignore: ignore Claude session metadata under .claude/.
Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean.
* docs/algorithm.md §8: previously named only the MMR-based
cross-bin deduplication path under TASKS.md §7 and tagged it
"P3+", which mismatches the actual section. §7 ("Out-of-Scope
but Noted for Later") tracks five deferred items (MMR + SigLIP
medoid + cross-shot coherence + learned quality models +
hardware decode), all targeted at P2+ in the Rust phase.
Updated wording to match.
* cli.py _extract_command: move the two info lines ("wrote N
keyframes" and "manifest: ...") from stdout to stderr. Status
output should never compete with the program's machine-readable
channel; the manifest path is also discoverable on disk at
args.output/manifest.json so callers don't depend on parsing
it from stdout. No tests asserted on the stdout content, so
no test changes needed.
* examples/extract_basic.py: real example replacing the .gitkeep
placeholder. Shows the full programmatic API (VideoDecoder open
+ default_saliency_provider + extract_all) wired against any
video file passed on the command line. Smoke-tested end-to-end
on a synthetic 60-frame video; correctly picks up Apple Vision
saliency on macOS and reports per-keyframe bin / timestamp /
confidence / laplacian / saliency.
Audit items not actioned (deliberately):
* _parse_config_json shared default: false alarm — SamplingConfig()
returns a fresh instance each call; no class-level mutation.
* Dead pick_strategy: already removed in 891a65b.
* scenesdetect linkage in rust-porting.md: 6 references already
present; reviewer missed them.
Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean.
…yAV pin Three issues surfaced by the automated review loop (all scored 75, below the PR-comment threshold but verified real). Fixing them here so the Rust-port contract is unambiguous and the dependency pin matches the exception hierarchy the CLI relies on. * types.py QualityMetrics.saliency_mass docstring rewrite: the field was described as the "sum of an Apple Vision attention heatmap" — misleading, since saliency.py explicitly rejects the CVPixelBuffer heatmap path in favour of `clamp(sum(area * confidence), 0, 1)` over the request's salientObjects bounding boxes. Docstring now names the actual computation and points to saliency.py for the rationale so a Rust porter gets the same contract. * cli.py module docstring: "Exit codes (per TASKS.md §7)" pointed readers at the "Out-of-Scope but Noted for Later" section. Corrected to §3 → Task 7 (where the exit-code spec actually lives, inside the Task Breakdown). * types.py SamplingConfig docstring: "Defaults are documented in docs/algorithm.md §6" pointed readers at the JSON Schemas section. Corrected to §7 "Parameter Rationale" (where the defaults table actually lives). * pyproject.toml: tighten `av>=12.0,<14.0` to `av>=13.0,<14.0`. The "Review fixes" commit narrowed the CLI except clause based on PyAV 13's exception hierarchy (FFmpegError inheriting from both OSError and ValueError); PyAV 12's hierarchy differs and would silently change which exceptions get caught. TASKS.md §6 Risk Register entry updated to match and explain the coupling. Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean, PyAV 13.1.0 installed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First end-to-end implementation of
findit-keyframe, the Python reference for per-shot keyframe extraction. The Rust port (out-of-scope for this PR, owned by a teammate) consumes the same JSON fixtures and matches the bit-precision contract recorded here.What ships
pyproject.toml, CI,src/findit_keyframe/types.py,docs/algorithm.md,docs/rust-porting.mdquality.py,decoder.py,sampler.pycli.py,benchmarks/bench_e2e.pysaliency.py, refreshed docscli.py,examples/extract_basic.py,docs/algorithm.mdVerification
ruff check,ruff format --check,mypy --strictall clean onsrc/findit_keyframe.Algorithmic contracts the Rust port replays verbatim
tests/fixtures/quality/canonical.json— bit-precision golden fixture forcompute_quality(10 deterministic frames, 6-decimal tolerance).tests/fixtures/regression/extract_all_synthetic.json— fullextract_allsnapshot using an in-memory deterministic decoder.docs/rust-porting.md— full Type Map, dependency map, idiom map.What's deliberately deferred
Per
TASKS.md§7 (P2+ in the Rust phase): MMR-based cross-bin deduplication, SigLIP medoid selection, cross-shot coherence, learned quality models, hardware decode. The decoder already exposespick_strategyfor the dense-shot Sequential path, but onlyPerShotSeekis implemented today.How to review
docs/algorithm.mdfor the algorithm specdocs/rust-porting.mdfor the cross-language contractsrc/findit_keyframe/sampler.pyfor the algorithmic hearttests/test_quality_golden.pyandtests/test_sampler_regression.pyfor the snapshot contracts