Skip to content

fix scalefactor delta overflows to comply with AAC spec (ISO 14496-3)#99

Open
nschimme wants to merge 2 commits intoknik0:masterfrom
nschimme:fix-pns
Open

fix scalefactor delta overflows to comply with AAC spec (ISO 14496-3)#99
nschimme wants to merge 2 commits intoknik0:masterfrom
nschimme:fix-pns

Conversation

@nschimme
Copy link
Copy Markdown
Contributor

This commit addresses two bugs in BlocQuant()'s scale factor range pass that allowed scalefactor differences to exceed the strict ±60 limit required by the AAC specification:

  1. Fix PNS delta predictor: PNS scale factors share the HCB_PNS codebook and require a separate delta predictor. Without it, the first PNS band's delta was computed relative to the regular global_gain, producing out-of-bounds deltas. A dedicated lastpns predictor is now initialized 90 steps below global_gain so the first PNS entry fits comfortably within the ±60 constraint.
  2. Enforce limits on all active bands: The quantizer previously only clamped deltas for HCB_ESC bands. The condition is now updated to (book != HCB_ZERO) && (book != HCB_NONE) to ensure the ±60 limit is enforced for every active scalefactor band, regardless of the Huffman codebook.

Additionally, this refactors the clamping logic into a centralized clamp_sf_diff() inline function and replaces hardcoded scalefactor magic numbers with named constants in huff2.h.

This causes a slight regression to speech where they accidentally did better, but overall all other cases we see a slight MOS improvement at the cost of 1% CPU throughput that we need to pay anyway for correctness: https://github.com/nschimme/faac/actions/runs/24264165540

This commit addresses two bugs in BlocQuant()'s scale factor range
pass that allowed scalefactor differences to exceed the strict ±60
limit required by the AAC specification:

1. Fix PNS delta predictor: PNS scale factors share the HCB_PNS
   codebook and require a separate delta predictor. Without it, the
   first PNS band's delta was computed relative to the regular
   global_gain, producing out-of-bounds deltas. A dedicated `lastpns`
   predictor is now initialized 90 steps below `global_gain` so the
   first PNS entry fits comfortably within the ±60 constraint.
2. Enforce limits on all active bands: The quantizer previously only
   clamped deltas for HCB_ESC bands. The condition is now updated to
   `(book != HCB_ZERO) && (book != HCB_NONE)` to ensure the ±60 limit
   is enforced for every active scalefactor band, regardless of the
   Huffman codebook.

Additionally, this refactors the clamping logic into a centralized
`clamp_sf_diff()` inline function and replaces hardcoded scalefactor
magic numbers with named constants in `huff2.h`.
@nschimme
Copy link
Copy Markdown
Contributor Author

Context on why SF_PNS_OFFSET was 90:

The AAC Scalefactor Delta Bug

The Bug: "The Broken Bridge"

AAC uses Huffman Book 12 to encode the difference (delta) between band volumes.

  • The Spec: Limits this delta to exactly ±60.
  • The Error: The original code didn't enforce this for all bands and used one "memory" (lastsf) for both music and noise (PNS).

The Scenario

  1. Music Band: Volume = 140.
  2. Noise Band (PNS): Volume = 50.
  3. Calculation: $50 - 140 = -90$.
  4. Result: $-90$ is outside the allowed ±60. The encoder writes an "out-of-bounds" index, making the file unplayable or "glitchy" on standard players.

The Fix: "Two Tracks & Guardrails"

  • The Guardrail: clamp_sf_diff ensures no delta ever exceeds ±60, keeping the bitstream legal.
  • The Second Track: By adding lastpns, noise bands now have their own "memory" separate from music.
  • The "90" Offset: By starting the noise memory 90 steps below the music, you ensure that quiet noise levels can be reached in a single legal step.

Why Speech MOS Dropped

In speech, "noise" (consonants like S or F) is often louder than background hiss.

  • If the consonant is only 30 steps below the music, but your code starts looking 90 steps below, it has to "climb up" to reach the right volume.
  • If it can't climb fast enough due to the ±60 limit, the consonant loses energy, sounding "muffled" or "lisp-like."

@nschimme
Copy link
Copy Markdown
Contributor Author

Okay, I discovered that this is partially responsible for #40. I'm working on the remaining fix.

@nschimme
Copy link
Copy Markdown
Contributor Author

Okay, this fixes #40 and possibly other cases too. No performance drop: https://github.com/nschimme/faac/actions/runs/24285523096

@nschimme
Copy link
Copy Markdown
Contributor Author

Hopefully the above two fixes also let me finally got Pseudo SBR to work. I kept getting blocked 😅

High-energy transients were producing quantized indices > 8191, exceeding
the AAC escape sequence limit and corrupting the bitstream.

- Peak Detection: Tracks bandmaxe (max spectral magnitude) per band.
- Gain Limiting: Proactively caps sfacfix in qlevel if the peak coefficient
  would exceed the representable range.
- Sync-Lock: Floors the scalefactor and re-derives the final gain to ensure
  encoder-decoder reconstruction alignment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant