Skip to content

Initial implementation: P1-P4 + audit fixes#1

Open
SRjoeee wants to merge 9 commits intomainfrom
feature/initial-implementation
Open

Initial implementation: P1-P4 + audit fixes#1
SRjoeee wants to merge 9 commits intomainfrom
feature/initial-implementation

Conversation

@SRjoeee
Copy link
Copy Markdown
Contributor

@SRjoeee SRjoeee commented Apr 16, 2026

First end-to-end implementation of findit-keyframe, the Python reference for per-shot keyframe extraction. The Rust port (out-of-scope for this PR, owned by a teammate) consumes the same JSON fixtures and matches the bit-precision contract recorded here.

What ships

Phase Task Modules
P1 Scaffold + types + docs skeleton pyproject.toml, CI, src/findit_keyframe/types.py, docs/algorithm.md, docs/rust-porting.md
P2 Quality metrics + decoder + sampler quality.py, decoder.py, sampler.py
P3 CLI + benchmark cli.py, benchmarks/bench_e2e.py
P4 Apple Vision saliency + doc finalisation saliency.py, refreshed docs
Review fixes (narrowed exception, walrus removal, full Args/Returns/Raises) across modules
Audit cosmetics (info lines to stderr, examples script, tightened doc reference) cli.py, examples/extract_basic.py, docs/algorithm.md

Verification

  • 147 tests pass, 1 skipped (off-Darwin guard test on macOS).
  • ruff check, ruff format --check, mypy --strict all clean on src/findit_keyframe.
  • macOS CI matrix exercises the Apple Vision saliency path; Linux CI exercises everything else.

Algorithmic contracts the Rust port replays verbatim

  • tests/fixtures/quality/canonical.json — bit-precision golden fixture for compute_quality (10 deterministic frames, 6-decimal tolerance).
  • tests/fixtures/regression/extract_all_synthetic.json — full extract_all snapshot using an in-memory deterministic decoder.
  • docs/rust-porting.md — full Type Map, dependency map, idiom map.

What's deliberately deferred

Per TASKS.md §7 (P2+ in the Rust phase): MMR-based cross-bin deduplication, SigLIP medoid selection, cross-shot coherence, learned quality models, hardware decode. The decoder already exposes pick_strategy for the dense-shot Sequential path, but only PerShotSeek is implemented today.

How to review

  • docs/algorithm.md for the algorithm spec
  • docs/rust-porting.md for the cross-language contract
  • src/findit_keyframe/sampler.py for the algorithmic heart
  • tests/test_quality_golden.py and tests/test_sampler_regression.py for the snapshot contracts

SRjoeee added 9 commits April 17, 2026 01:11
* T1 — repo structure: pyproject.toml (hatchling, mypy strict, ruff,
  pytest), dual Apache-2.0/MIT license, GitHub Actions CI for ruff
  + mypy + pytest on Linux/macOS for Python 3.11/3.12, .gitignore.
* T2 — src/findit_keyframe/types.py: Timebase, Timestamp, ShotRange
  (semantic equality and ordering via cross-multiply, mirrors
  scenesdetect's Rust types), Confidence (StrEnum), SamplingConfig
  (mutable), QualityMetrics (frozen), ExtractedKeyframe (mutable).
  31 tests covering the invariants from TASKS.md §2.
* T9 skeleton — docs/algorithm.md (algorithm spec + JSON schemas)
  and docs/rust-porting.md (type/dependency/idiom maps for the
  Rust port).
* Module stubs for decoder, quality, sampler, saliency, cli to be
  filled in P2-P4.
* T4 (quality.py): rgb_to_luma (BT.601 fixed-point, bit-exact with
  scenesdetect), laplacian_variance (3x3 kernel, no scipy/cv2),
  mean_luma, luma_variance (sample, ddof=1), entropy (Shannon over
  256-bin histogram), QualityGate, compute_quality. 39 tests.
* T3 (decoder.py): VideoDecoder context-manager around PyAV with
  decode_at (keyframe seek + forward decode to target PTS) and
  decode_sequential (single linear pass over a shot list, sorts
  shots internally so original ids survive). DecodedFrame stores
  width/height alongside the ndarray for 1:1 Rust port. Strategy
  enum + pick_strategy density heuristic (>0.3 shots/s or >200
  shots -> Sequential). 18 tests.
* T5 basic (sampler.py): compute_bins (boundary-shrunken equal
  partition, ceil(D/I) clamped to max), _candidate_times (cell
  centres), _ordinal_rank, score_bin_candidates (0.6*lap-rank +
  0.2*ent-rank + 0.2*saliency), select_from_bin, fallback
  (expand into neighbours -> Low; force-pick -> Degraded),
  extract_for_shot, extract_all. 29 tests.
* tests/conftest.py: session-scoped tiny_video (uniform ramp,
  exercises gate-fail / Degraded path) and varied_video (mid-tone
  noise, exercises happy path) fixtures. Both encoded once per
  session via libx264.
* Re-export VideoDecoder, QualityGate, extract_all, etc. from
  package root so the README quickstart works as written.

Suite: 117 passed, ruff + ruff format + mypy strict all clean.
* T7 (cli.py): findit-keyframe extract VIDEO SHOTS_JSON OUTPUT_DIR.
  Reads scenesdetect-compatible shot JSON, optional --config for
  SamplingConfig overrides, --saliency {none,apple} flag (apple
  stubs out until T6 lands in P4). Per-keyframe baseline JPEG via
  PyAV's mjpeg + image2 muxer (no Pillow dep), manifest.json with
  quality dict and confidence string. argparse subclass routes
  parse errors through exit code 1 instead of argparse's default
  2 so the documented exit-code semantics hold:
    0 success / 1 input error / 2 extraction error.
  16 tests cover JSON parsing, --help, end-to-end JPEG round-trip
  via PyAV, manifest schema, filename pattern, and all three exit
  codes.

* T8 (benchmarks/bench_e2e.py): standalone CLI script for
  end-to-end perf baselines. Synthesises uniform 4-second shots
  if --shots is omitted, accepts a real shot JSON otherwise.
  Times extract_all, normalises peak RSS across Linux/macOS,
  appends a row to benchmarks/results.md (auto-creates with
  header on first run) tagged with git SHA and UTC timestamp.
  README documents usage and the P3 budget (Kino Demo 1m44s
  must finish in <30 s on M-series). 2 subprocess smoke tests.

Suite: 135 passed, ruff + ruff format + mypy strict all clean.
* T6 (saliency.py): SaliencyProvider runtime-checkable Protocol,
  NoopSaliencyProvider (always 0.0), AppleVisionSaliencyProvider
  wrapping VNGenerateAttentionBasedSaliencyImageRequest, and
  default_saliency_provider() factory. Vision/Quartz are imported
  lazily inside __init__ so the surrounding module is importable
  on Linux without pyobjc-framework-Vision present.

  The Apple provider derives its scalar from salientObjects
  bounding boxes — sum(area * confidence) clamped to [0, 1] —
  rather than reading the saliency heatmap CVPixelBuffer. The
  bounding-box scalar carries the same "is anything attention-
  grabbing" signal with a much cleaner pyobjc API; the Rust port
  (objc2-vision) can choose either path.

  8 tests: NoopSaliencyProvider, default_saliency_provider on
  Linux (mocked) and macOS, AppleVisionSaliencyProvider non-Darwin
  guard, plus three macOS-only tests that exercise the actual
  Vision request on a centre-white test frame.

* Wiring: sampler's select_from_bin, _select_with_fallback,
  extract_for_shot, and extract_all all accept an optional
  saliency_provider; the score formula's saliency_mass term is
  populated only when a provider is configured. _compute_metrics
  helper avoids the if/else duplication between the native-bin
  selector and the force-pick fallback. cli's --saliency apple
  now wires AppleVisionSaliencyProvider end-to-end (was a stub
  returning input-error in P3); RuntimeError on instantiation
  (e.g. running on Linux) is mapped to exit code 1.

* T9 finalise (docs/): algorithm.md now reflects shipped behaviour
  exactly (cell-centred sampling, ordinal-rank scoring, three-tier
  fallback, saliency provider contract, parameter rationale table
  including gate thresholds). rust-porting.md is fleshed out with
  every shipping type (DecodedFrame, Strategy, QualityGate, all
  three SaliencyProvider impls), the actual numpy ops we use
  (BT.601 fixed-point, Laplacian via slicing, ordinal rank via
  argsort), and an Apple Vision idiom map for objc2-vision.

* __init__.py exports SaliencyProvider, NoopSaliencyProvider,
  AppleVisionSaliencyProvider, default_saliency_provider so the
  README quickstart works as written.

Suite: 143 passed, 1 skipped (off-Darwin guard test on macOS),
ruff + ruff format + mypy strict all clean.
…ises docstrings

* Blocker-lite (cli.py): narrow `except Exception` to
  (av.error.FFmpegError, OSError, ValueError, RuntimeError) so
  programmer bugs (TypeError, AttributeError, ...) propagate as
  crashes instead of being mapped to the extraction-error exit
  code. Verified PyAV 13's exception hierarchy: FFmpegError is
  the base for both file/format errors (subclass of OSError) and
  decode errors (subclass of ValueError); explicit listing of
  OSError/ValueError keeps the contract clear and catches the
  ones not routed through PyAV's hierarchy (e.g. our own raise).

* Major (sampler.py): remove the two walrus assignments in
  _select_with_fallback and rebind to explicit variables; this
  was a Rust-translation trap (G2 in TASKS.md). Delete the dead
  `_ = pick_strategy(shots, decoder.duration_sec)` call from
  extract_all and rewrite its docstring to match shipped
  behaviour (the call did nothing; it was leftover from when I
  thought we'd implement the Sequential branch in P3).

* Minor (TASKS.md §5): add full Args/Returns/Raises docstrings
  to every public function across decoder, quality, sampler,
  saliency, and cli modules. VideoDecoder.open had ZERO
  docstring before (worst offender). Touched ~20 functions in
  total. No behaviour changes.

* Side effect: with the pick_strategy import gone, sampler.py's
  VideoDecoder import is now annotation-only -> moved to the
  TYPE_CHECKING block (ruff TC001).

Suite: 143 passed, 1 skipped, ruff + ruff format + mypy strict
all clean.
* tests/fixtures/quality/canonical.json — 10 deterministic frames
  generated by integer-arithmetic kinds (solid / channel /
  h_gradient / v_gradient / checker / single_pixel), each with
  expected QualityMetrics rounded to 6 decimal places. Generators
  use no PRNG so the Rust port can reproduce them byte-for-byte.

* tests/test_quality_golden.py — single test asserts every field
  of every frame matches expected to 1e-6. The same file carries
  the regenerator (run via `python tests/test_quality_golden.py`)
  and the generator semantics in `_build_frame`'s docstring,
  which is the authoritative spec the Rust port consumes.

* quality.py: short-circuit `entropy()` for delta distributions
  to return clean +0.0 instead of -0.0 (the naive
  `-sum([p*log2(p)]) where p == 1.0` evaluates to -0.0 in
  IEEE-754). Tidies the JSON snapshot; `0.0 == -0.0` so no
  existing test changes.

* docs/rust-porting.md: register the new fixture in the test
  fixture map with explicit instructions for Rust replay.

Suite: 144 passed, 1 skipped, ruff + ruff format + mypy strict
all clean.
…sion snapshot

* tests/conftest.py: add quality_gradient_video fixture — a 20-second
  15-fps 96x96 sharp/blur/sharp video. Frames in the middle third are
  the same noise as the outer thirds smoothed by a 5x5 box filter, so
  they survive libx264 encoding with measurably lower Laplacian
  variance. Bundles a small numpy-only _box_blur helper so the fixture
  has no scipy dependency.

* tests/test_sampler.py::TestQualityGradient — TASKS.md T5
  verification ("verify selected frames come from sharp regions in
  each bucket"). Two cases:
    1. Asserts that the sampler picks sharp-region timestamps in the
       two mixed bins (1 and 3) and stays inside the all-sharp /
       all-blur regions for bins 0, 2, 4.
    2. Corroborates the encoder/decoder pipeline by asserting the
       all-blur bin's Laplacian variance is strictly below both
       all-sharp bins'.

* tests/test_sampler_regression.py — TASKS.md T5 regression fixture
  (a Kino-Demo stand-in until the real asset is available). Uses an
  in-memory _FakeDecoder that returns deterministic noise frames
  keyed by frame index; the sampler's only contact with the decoder
  is via duck-typed duration_sec/timebase/decode_at, so the snapshot
  is bit-exact across machines and PyAV/FFmpeg versions. Snapshots
  every (shot_id, bucket, timestamp_sec, confidence, quality) tuple
  to tests/fixtures/regression/extract_all_synthetic.json. Carries
  its own regenerator (run via `python tests/test_sampler_regression.py`).

* .gitignore: ignore Claude session metadata under .claude/.

Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean.
* docs/algorithm.md §8: previously named only the MMR-based
  cross-bin deduplication path under TASKS.md §7 and tagged it
  "P3+", which mismatches the actual section. §7 ("Out-of-Scope
  but Noted for Later") tracks five deferred items (MMR + SigLIP
  medoid + cross-shot coherence + learned quality models +
  hardware decode), all targeted at P2+ in the Rust phase.
  Updated wording to match.

* cli.py _extract_command: move the two info lines ("wrote N
  keyframes" and "manifest: ...") from stdout to stderr. Status
  output should never compete with the program's machine-readable
  channel; the manifest path is also discoverable on disk at
  args.output/manifest.json so callers don't depend on parsing
  it from stdout. No tests asserted on the stdout content, so
  no test changes needed.

* examples/extract_basic.py: real example replacing the .gitkeep
  placeholder. Shows the full programmatic API (VideoDecoder open
  + default_saliency_provider + extract_all) wired against any
  video file passed on the command line. Smoke-tested end-to-end
  on a synthetic 60-frame video; correctly picks up Apple Vision
  saliency on macOS and reports per-keyframe bin / timestamp /
  confidence / laplacian / saliency.

Audit items not actioned (deliberately):
* _parse_config_json shared default: false alarm — SamplingConfig()
  returns a fresh instance each call; no class-level mutation.
* Dead pick_strategy: already removed in 891a65b.
* scenesdetect linkage in rust-porting.md: 6 references already
  present; reviewer missed them.

Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict all clean.
…yAV pin

Three issues surfaced by the automated review loop (all scored 75,
below the PR-comment threshold but verified real). Fixing them here
so the Rust-port contract is unambiguous and the dependency pin
matches the exception hierarchy the CLI relies on.

* types.py QualityMetrics.saliency_mass docstring rewrite: the field
  was described as the "sum of an Apple Vision attention heatmap"
  — misleading, since saliency.py explicitly rejects the
  CVPixelBuffer heatmap path in favour of
  `clamp(sum(area * confidence), 0, 1)` over the request's
  salientObjects bounding boxes. Docstring now names the actual
  computation and points to saliency.py for the rationale so a
  Rust porter gets the same contract.

* cli.py module docstring: "Exit codes (per TASKS.md §7)" pointed
  readers at the "Out-of-Scope but Noted for Later" section.
  Corrected to §3 → Task 7 (where the exit-code spec actually
  lives, inside the Task Breakdown).

* types.py SamplingConfig docstring: "Defaults are documented in
  docs/algorithm.md §6" pointed readers at the JSON Schemas
  section. Corrected to §7 "Parameter Rationale" (where the
  defaults table actually lives).

* pyproject.toml: tighten `av>=12.0,<14.0` to `av>=13.0,<14.0`.
  The "Review fixes" commit narrowed the CLI except clause based
  on PyAV 13's exception hierarchy (FFmpegError inheriting from
  both OSError and ValueError); PyAV 12's hierarchy differs and
  would silently change which exceptions get caught. TASKS.md
  §6 Risk Register entry updated to match and explain the
  coupling.

Suite: 147 passed, 1 skipped, ruff + ruff format + mypy strict
all clean, PyAV 13.1.0 installed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant