soundevents

Production-oriented Rust inference for CED AudioSet sound-event classifiers — load an ONNX model, feed it 16 kHz mono audio, get back ranked RatedSoundEvent predictions with names, ids, and confidences. Long clips are handled via configurable chunking.

Highlights

Drop-in CED inference — load any CED AudioSet ONNX model (or use the bundled tiny variant) and run it directly on &[f32] PCM samples. No Python, no preprocessing pipeline.
Typed labels, not bare integers — every prediction comes back as an EventPrediction carrying a &'static RatedSoundEvent from soundevents-dataset, so you get the canonical AudioSet name, the /m/... id, the model class index, and the confidence in one struct.
Compile-time class-count guarantee — the NUM_CLASSES = 527 constant comes from the rated dataset at codegen time. If a model returns the wrong number of classes you get a typed ClassifierError::UnexpectedClassCount instead of a silent mismatch.
Long-clip chunking built in — classify_chunked / classify_all_chunked window the input at a configurable hop, run inference on each chunk, and aggregate the per-chunk confidences with either Mean or Max. Defaults match CED's 10 s training window (160 000 samples at 16 kHz), and fixed-size chunk batches can now be packed into one model call.
Top-k via a tiny min-heap — classify(samples, k) does not allocate a full 527-element scores vector to find the top results.
Batch-ready low-level API — predict_raw_scores_batch, predict_raw_scores_batch_flat, predict_raw_scores_batch_into, classify_all_batch, and classify_batch accept equal-length clip batches for service-layer batching.
Bring-your-own model or bundle one — load from a path, from in-memory bytes, or enable the bundled-tiny feature to embed models/tiny.onnx directly into your binary.

Quick start

[dependencies]
soundevents = "0.2"

use soundevents::{Classifier, Options};

fn load_mono_16k_audio(_: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
    Ok(vec![0.0; 16_000])
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;

    // Bring your own decoder/resampler — soundevents expects mono f32
    // samples at 16 kHz, in [-1.0, 1.0].
    let samples: Vec<f32> = load_mono_16k_audio("clip.wav")?;

    // Top-5 predictions for a clip up to ~10 s long.
    for prediction in classifier.classify(&samples, 5)? {
        println!(
            "{:>5.1}%  {:>3}  {}  ({})",
            prediction.confidence() * 100.0,
            prediction.index(),
            prediction.name(),
            prediction.id(),
        );
    }
    Ok(())
}

Long clips: chunked inference

Classifier::classify_chunked slides a window over the input and aggregates each chunk's per-class confidences. The defaults (10 s window, 10 s hop, mean aggregation) match CED's training setup; tune them for overlap or peak-pooling.

use soundevents::{ChunkAggregation, ChunkingOptions, Classifier};

fn load_long_clip() -> Result<Vec<f32>, Box<dyn std::error::Error>> {
    Ok(vec![0.0; 320_000])
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;
    let samples: Vec<f32> = load_long_clip()?;

    let opts = ChunkingOptions::default()
        // 5 s overlap (50%) between adjacent windows
        .with_hop_samples(80_000)
        // Batch up to 4 equal-length windows per session.run()
        .with_batch_size(4)
        // Keep the loudest detection in any window instead of averaging
        .with_aggregation(ChunkAggregation::Max);

    let top3 = classifier.classify_chunked(&samples, 3, opts)?;
    for prediction in top3 {
        println!("{}: {:.2}", prediction.name(), prediction.confidence());
    }
    Ok(())
}

Models

The four CED variants are sourced from the mispeech Hugging Face organisation, exported to ONNX, and checked into this repo under soundevents/models/. You should not normally need to download anything — git clone gives you a working classifier out of the box.

Variant	File	Size	Hugging Face source
`tiny`	`soundevents/models/tiny.onnx`	6.4 MB	`mispeech/ced-tiny`
`mini`	`soundevents/models/mini.onnx`	10 MB	`mispeech/ced-mini`
`small`	`soundevents/models/small.onnx`	22 MB	`mispeech/ced-small`
`base`	`soundevents/models/base.onnx`	97 MB	`mispeech/ced-base`

All four expose the same input/output contract: mono f32 PCM at 16 kHz in, 527-class scores out (SAMPLE_RATE_HZ / NUM_CLASSES). They differ only in parameter count and accuracy/latency trade-off, so you can swap variants without touching application code.

Note — the four ONNX files together are ~135 MB. If you fork this repo and want to keep the working tree slim, consider tracking soundevents/models/*.onnx with git LFS.

Refreshing models from upstream

If upstream releases new weights, or you cloned without the model files, refetch them with:

# Requires huggingface_hub:  pip install --user huggingface_hub
./scripts/download_models.sh

# Or just one variant
./scripts/download_models.sh tiny

The script downloads the *.onnx artifact from each mispeech/ced-* Hugging Face repo and writes it as soundevents/models/<variant>.onnx.

See THIRD_PARTY_NOTICES.md for upstream model sources and attribution details.

Bundled tiny model

Enable the bundled-tiny feature to embed models/tiny.onnx into your binary — useful for CLI tools and self-contained services where you don't want to ship a separate model file.

soundevents = { version = "0.2", features = ["bundled-tiny"] }

# #[cfg(feature = "bundled-tiny")]
use soundevents::{Classifier, Options};

# fn main() -> Result<(), Box<dyn std::error::Error>> {
# #[cfg(feature = "bundled-tiny")]
# {
let mut classifier = Classifier::tiny(Options::default())?;
# let _ = &mut classifier;
# }
# Ok(())
# }

Features

Feature	Default	What you get
`bundled-tiny`		Embeds `models/tiny.onnx` into the crate so `Classifier::tiny()` works without an external file.

The full input/output contract:

Constant	Value	Meaning
`SAMPLE_RATE_HZ`	`16_000`	Required input sample rate (mono `f32`).
`DEFAULT_CHUNK_SAMPLES`	`160_000`	Default 10 s window/hop for chunked inference.
`NUM_CLASSES`	`527`	Number of CED output classes — derived at compile time from `RatedSoundEvent::events().len()`.

For low-level batching, every clip in predict_raw_scores_batch* / classify_*_batch must be non-empty and have the same sample count. predict_raw_scores_batch_flat returns one row-major Vec<f32>, and predict_raw_scores_batch_into lets callers reuse their own output buffer to avoid per-call result allocations. classify_chunked uses the same equal-length restriction internally when ChunkingOptions::batch_size() > 1, which is naturally satisfied for fixed-size windows and automatically falls back to smaller batches for the final short tail chunk.

Development

Regenerate the dataset from upstream sources:

cargo xtask codegen

Run the test suite:

cargo test

License

soundevents is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details. Bundled third-party model attributions and source licenses are documented in THIRD_PARTY_NOTICES.md.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.cargo		.cargo
.github		.github
scripts		scripts
soundevents-dataset		soundevents-dataset
soundevents		soundevents
xtask		xtask
.codecov.yml		.codecov.yml
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

soundevents

Highlights

Quick start

Long clips: chunked inference

Models

Refreshing models from upstream

Bundled tiny model

Features

Development

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

soundevents

Highlights

Quick start

Long clips: chunked inference

Models

Refreshing models from upstream

Bundled tiny model

Features

Development

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages