CI: Introduce FAAC Benchmark Suite for automated regression testing by nschimme · Pull Request #78 · knik0/faac

nschimme · 2026-03-04T13:58:30Z

This PR introduces the FAAC Benchmark Suite, an automated CI/CD pipeline designed to provide objective data on every change.

Currently, the project lacks a formal regression and testing suite. For a maintainer, this makes merging optimizations or refactors a high-risk activity. This suite aims to act as a "safety net," providing the metrics needed to ensure that new code maintains the project's standards for quality, speed, and size.

The "Golden Triangle" Philosophy

I've designed the benchmarking logic around three pillars critical to the FAAC mission. Note that these are a first draft—I am completely open to adjusting this philosophy or the specific metrics based on what you value most for the project.

Audio Fidelity: Uses the ViSQOL model to predict Mean Opinion Score (MOS). This ensures psychoacoustic changes don't introduce audible artifacts like "metallic" ringing.
Computational Efficiency: Measures normalized throughput. While FAAC targets being fast, this ensures we don't accidentally introduce regressions that impact real-time performance on low-power cores.
Minimal Footprint: Tracks the binary size of libfaac.so. For embedded VSS and IoT targets, binary size is a primary feature, and this suite makes any "bloat" immediately visible.

Implementation

GitHub Actions: Runs on every pull request, comparing the PR branch (Candidate) against the Master branch (Baseline).
Automated Reporting: Generates a high-signal Markdown report in the PR comments, highlighting regressions (red), wins (green), or bit-identical refactors (verified via MD5).
Datasets: Includes scripts to pull speech and music samples from TCD-VoIP and PMLT2014 to test real-world scenarios.

Focus & Feedback Requested

The primary goal of this draft is to establish the metrics. I would value your feedback on:

The Thresholds: Currently, a 0.1 MOS drop or a 10% throughput drop triggers a "Failure" icon. Are these the right sensitivities for you?
The "Why": If you feel the focus should shift (e.g., more weight on bitrate accuracy vs. throughput), I’m happy to retune the reporting logic.
CI Usage: To keep things conservative, we can set this to run only on a manual trigger or specifically for PRs targeting the master branch.

This is intended to be a collaborative baseline. I want to ensure the metrics we track are the ones that give you the most confidence when reviewing contributions.

Sample Report from this PR: benchmark-report-full.zip

nschimme · 2026-03-04T14:47:36Z

Seems we need to tweak some permissions for it to leave a GH comment: https://github.com/knik0/faac/actions/runs/22672563501/job/65726481319?pr=78

We should have seen something like: nschimme#38

fabiangreffrath · 2026-03-04T19:22:29Z

Seems we need to tweak some permissions for it to leave a GH comment: https://github.com/knik0/faac/actions/runs/22672563501/job/65726481319?pr=78

Frankly, it feels a bit uneasy to introduce a test suite that's about as big as the library itself and that downloads some random samples from somewhere else under a questionable license.

I'll put trust in your justice if you tell me that the changes you suggest will generate output identical as before.

You know, for me this is just a little side project. I'm the last one alive here with commit rights. I haven't written a single line of the actual codec myself.

nschimme · 2026-03-04T20:29:34Z

I get the hesitation, but I’m doing this specifically so you don't have to trust my 'justice.' I’ve already verified the changes are 100% bit-identical, and this suite is just the math to prove it to you so you don't have to audit code you didn't write.

On the license/size stuff: the samples aren't in the repo, the CI just pulls them to run the check. It keeps the library clean. If the suite ever becomes a maintenance headache or the 'uneasiness' doesn't go away, just rm -rf tests/ and delete the workflow. I'll be the one maintaining it anyway, so if it breaks, that's on me.

I’d rather have the data than fly blind. How about we run with it, and if it's a pain in the ass, we scrap it?

nschimme · 2026-03-04T21:05:51Z

That does beg the question, do you have access to give other people write access? If not, maybe we create a new faac organization and start putting our changes there. This becomes a mirror.

fabiangreffrath · 2026-03-04T21:41:50Z

That does beg the question, do you have access to give other people write access? If not, maybe we create a new faac organization and start putting our changes there. This becomes a mirror.

I only have commit rights, I cannot change anything about the repository.

My idea is to get the remaining three PRs merged into the code (without the test suite) and release this as 1.40. Then I'd abandon this repository as well and will happily hand over maintenance to a more active fork.

And please don't forget about the brother project faad2.

nschimme · 2026-03-04T21:45:00Z

Sounds good, I'll keep maintaining it on my side and then leaving comments with the results in my PRs

nschimme · 2026-03-04T22:18:46Z

We could be cheeky... I see that https://github.com/FAACD is free 😈

This reverts commit 1866c44.

…esting

nschimme · 2026-03-05T03:16:13Z

I extracted the code out into a repo that I own and exposed it as GitHub action. This PR just uses it now. See the extracted solution at https://github.com/nschimme/faac-benchmark

nschimme · 2026-03-05T05:15:33Z

Answering my own question, I think I'll have to tweak the thresholds for failure and wins a bit (and possibly make changes together). I'll post this table here for our reference:

### FAAC Optimization Impact Estimates (2026 Roadmap)

| Task / Feature               | MOS Impact | Throughput (CPU) | Binary Size |
|:-----------------------------|:-----------|:-----------------|:------------|
| Adaptive Rounding (AQR)      | +15-20%    | 0% (Negligible)  | <1%         |
| MDCT-based Psychoacoustic    | +5-10%     | +30-40%          | +2-5%       |
| Stereo Mode Hysteresis       | +5%        | 0%               | <1%         |
| Transient Detection Tuning   | +10%       | 0%               | <1%         |
| ATH Scaling (VoIP/VSS)       | +5%        | 0%               | <1%         |
| Bit Reservoir Control        | +10-15%    | -5% (Overhead)   | +1-2%       |
| Temporal Noise Shaping (TNS) | +8-12%     | -10% (Complexity)| +3-5%       |

---
NOTES:
- Adaptive Rounding addresses the historical "shimmer" issues
- MDCT-PAM targets the 30%+ CPU gain and better masking alignment
- Stereo Hysteresis stabilizes the soundstage image in complex passages

CI: add FAAC Benchmark Suite to test quality

1866c44

nschimme added 2 commits March 4, 2026 21:13

Revert "CI: add FAAC Benchmark Suite to test quality"

ce9c4a4

This reverts commit 1866c44.

CI: add FAAC Benchmark Suite GitHub Action for automated regression t…

aeebe74

…esting

pin faac-benchmark to v1

4f0e73c

nschimme marked this pull request as draft March 6, 2026 14:37

post comment again

8ab42d5

nschimme marked this pull request as ready for review March 14, 2026 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Introduce FAAC Benchmark Suite for automated regression testing#78

CI: Introduce FAAC Benchmark Suite for automated regression testing#78
nschimme wants to merge 5 commits intoknik0:masterfrom
nschimme:benchmark

nschimme commented Mar 4, 2026 •

edited

Loading

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

fabiangreffrath commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

fabiangreffrath commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

nschimme commented Mar 5, 2026

Uh oh!

nschimme commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nschimme commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The "Golden Triangle" Philosophy

Implementation

Focus & Feedback Requested

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

fabiangreffrath commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

fabiangreffrath commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

nschimme commented Mar 4, 2026

Uh oh!

nschimme commented Mar 5, 2026

Uh oh!

nschimme commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nschimme commented Mar 4, 2026 •

edited

Loading