Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 41 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,45 @@
# UNRELEASED
# Changelog

# 0.1.2 (January 6th, 2022)
All notable changes to this workspace will be documented in this file.

FEATURES
## Unreleased

## 0.2.0 - 2026-04-08

### `soundevents`

- Added `predict_raw_scores_batch_flat` and `predict_raw_scores_batch_into` for lower-allocation batched raw-score access.
- Expanded batched inference coverage with regression tests that verify flat and buffer-reuse paths against sequential inference.
- Removed redundant input validation in `classify_batch` while preserving the existing error behavior for invalid batches.
- Tightened crate metadata and docs.rs configuration so feature-gated APIs, including `Classifier::tiny`, render correctly on published docs.
- Added packaged third-party notices for bundled CED model artifacts.

### `soundevents-dataset`

- Packaged the dual-license texts with the published crate and aligned crate metadata for docs.rs and crates.io discovery.
- Kept the crate on its Rust 1.59 / edition 2021 compatibility track while removing the in-source `deny(warnings)` footgun.
- Added packaged third-party notices for bundled AudioSet ontology and label metadata.

### Workspace

- Included license files in published package contents for both crates.
- Upgraded README examples from ignored snippets to compile-checked doctests across the workspace.

## 0.1.0 - 2026-04-08

### `soundevents`

- Initial public release of the ONNX Runtime wrapper for CED AudioSet classifiers.
- Added file, memory, and bundled-model loading paths plus configurable graph optimization.
- Added ranked top-k helpers, raw-score accessors, and chunked inference with mean/max aggregation.
- Added equal-length batch APIs for clip inference and chunked window batching for higher-throughput services.

Comment on lines +32 to +36
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0.1.0 changelog entry claims “Added equal-length batch APIs…”, but those APIs are introduced in this PR alongside the 0.2.0 bump. This makes the release notes misleading; consider moving that bullet into 0.2.0 (or removing it from 0.1.0) so each version’s section reflects what shipped in that release.

Copilot uses AI. Check for mistakes.
### `soundevents-dataset`

- Initial public release of the typed AudioSet dataset companion crate.
- Included both the 527-class rated label set and the full 632-entry ontology as `&'static` generated data.
- Kept the crate `no_std`-friendly, allocation-free at runtime, and compatible with Rust 1.59.

### `xtask`

- Added code generation for the rated label set and ontology modules from upstream AudioSet source data.
7 changes: 6 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@ resolver = "3"
thiserror = { version = "2", default-features = false }
serde = "1"

soundevents-dataset = { version = "0.1", path = "soundevents-dataset", default-features = false }
soundevents-dataset = { version = "0.2", path = "soundevents-dataset", default-features = false }

[workspace.package]
license = "MIT OR Apache-2.0"
repository = "https://github.com/findit-ai/soundevents"
homepage = "https://github.com/findit-ai/soundevents"

[profile.bench]
opt-level = 3
Expand Down
2 changes: 1 addition & 1 deletion LICENSE-APACHE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ APPENDIX: How to apply the Apache License to your work.
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright (c) 2026 The FinDIT studio developers

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion LICENSE-MIT
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2015 The Rust Project Developers
Copyright (c) 2026 The FinDIT studio developers

Permission is hereby granted, free of charge, to any
person obtaining a copy of this software and associated
Expand Down
38 changes: 32 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,25 @@ Production-oriented Rust inference for [CED](https://arxiv.org/abs/2308.11957) A
- **Drop-in CED inference** — load any [CED](https://arxiv.org/abs/2308.11957) AudioSet ONNX model (or use the bundled `tiny` variant) and run it directly on `&[f32]` PCM samples. No Python, no preprocessing pipeline.
- **Typed labels, not bare integers** — every prediction comes back as an [`EventPrediction`] carrying a `&'static RatedSoundEvent` from [`soundevents-dataset`](./soundevents-dataset), so you get the canonical AudioSet name, the `/m/...` id, the model class index, and the confidence in one struct.
- **Compile-time class-count guarantee** — the `NUM_CLASSES = 527` constant comes from the rated dataset at codegen time. If a model returns the wrong number of classes you get a typed [`ClassifierError::UnexpectedClassCount`] instead of a silent mismatch.
- **Long-clip chunking built in** — `classify_chunked` / `classify_all_chunked` window the input at a configurable hop, run inference on each chunk, and aggregate the per-chunk confidences with either `Mean` or `Max`. Defaults match CED's 10 s training window (160 000 samples at 16 kHz).
- **Long-clip chunking built in** — `classify_chunked` / `classify_all_chunked` window the input at a configurable hop, run inference on each chunk, and aggregate the per-chunk confidences with either `Mean` or `Max`. Defaults match CED's 10 s training window (160 000 samples at 16 kHz), and fixed-size chunk batches can now be packed into one model call.
- **Top-k via a tiny min-heap** — `classify(samples, k)` does not allocate a full 527-element scores vector to find the top results.
- **Batch-ready low-level API** — `predict_raw_scores_batch`, `predict_raw_scores_batch_flat`, `predict_raw_scores_batch_into`, `classify_all_batch`, and `classify_batch` accept equal-length clip batches for service-layer batching.
- **Bring-your-own model or bundle one** — load from a path, from in-memory bytes, or enable the `bundled-tiny` feature to embed `models/tiny.onnx` directly into your binary.

## Quick start

```toml
[dependencies]
soundevents = "0.1"
soundevents = "0.2"
```

```rust,ignore
```rust,no_run
use soundevents::{Classifier, Options};

fn load_mono_16k_audio(_: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
Ok(vec![0.0; 16_000])
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README examples use a repo-relative path ("soundevents/models/tiny.onnx") when calling Classifier::from_file. For crates.io users this path typically won’t exist at runtime, so the snippet is likely to fail when copied. Consider switching the example to a placeholder like "path/to/model.onnx" (and/or pointing to Classifier::tiny behind bundled-tiny) so the quick start is correct outside of a git checkout.

Suggested change
let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;
let mut classifier = Classifier::from_file("path/to/model.onnx")?;

Copilot uses AI. Check for mistakes.

Expand All @@ -61,16 +66,22 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {

`Classifier::classify_chunked` slides a window over the input and aggregates each chunk's per-class confidences. The defaults (10 s window, 10 s hop, mean aggregation) match CED's training setup; tune them for overlap or peak-pooling.

```rust,ignore
```rust,no_run
use soundevents::{ChunkAggregation, ChunkingOptions, Classifier};

fn load_long_clip() -> Result<Vec<f32>, Box<dyn std::error::Error>> {
Ok(vec![0.0; 320_000])
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;
let samples: Vec<f32> = load_long_clip()?;

let opts = ChunkingOptions::default()
// 5 s overlap (50%) between adjacent windows
.with_hop_samples(80_000)
// Batch up to 4 equal-length windows per session.run()
.with_batch_size(4)
// Keep the loudest detection in any window instead of averaging
.with_aggregation(ChunkAggregation::Max);

Expand Down Expand Up @@ -111,18 +122,29 @@ If upstream releases new weights, or you cloned without the model files, refetch

The script downloads the `*.onnx` artifact from each `mispeech/ced-*` Hugging Face repo and writes it as `soundevents/models/<variant>.onnx`.

See [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md) for upstream model
sources and attribution details.

### Bundled tiny model

Enable the `bundled-tiny` feature to embed `models/tiny.onnx` into your binary — useful for CLI tools and self-contained services where you don't want to ship a separate model file.

```toml
soundevents = { version = "0.1", features = ["bundled-tiny"] }
soundevents = { version = "0.2", features = ["bundled-tiny"] }
```

```rust,ignore
```rust
# #[cfg(feature = "bundled-tiny")]
use soundevents::{Classifier, Options};

# fn main() -> Result<(), Box<dyn std::error::Error>> {
# #[cfg(feature = "bundled-tiny")]
# {
let mut classifier = Classifier::tiny(Options::default())?;
# let _ = &mut classifier;
# }
# Ok(())
# }
```

## Features
Expand All @@ -139,6 +161,8 @@ The full input/output contract:
| `DEFAULT_CHUNK_SAMPLES` | `160_000` | Default 10 s window/hop for chunked inference. |
| `NUM_CLASSES` | `527` | Number of CED output classes — derived at compile time from `RatedSoundEvent::events().len()`. |

For low-level batching, every clip in `predict_raw_scores_batch*` / `classify_*_batch` must be non-empty and have the same sample count. `predict_raw_scores_batch_flat` returns one row-major `Vec<f32>`, and `predict_raw_scores_batch_into` lets callers reuse their own output buffer to avoid per-call result allocations. `classify_chunked` uses the same equal-length restriction internally when `ChunkingOptions::batch_size() > 1`, which is naturally satisfied for fixed-size windows and automatically falls back to smaller batches for the final short tail chunk.

## Development

Regenerate the dataset from upstream sources:
Expand All @@ -162,6 +186,8 @@ cargo test
Apache License (Version 2.0).

See [LICENSE-APACHE](LICENSE-APACHE), [LICENSE-MIT](LICENSE-MIT) for details.
Bundled third-party model attributions and source licenses are documented in
[THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md).

Copyright (c) 2026 FinDIT studio authors.

Expand Down
15 changes: 10 additions & 5 deletions soundevents-dataset/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
[package]
name = "soundevents-dataset"
version = "0.1.0"
version = "0.2.0"
# Intentionally kept on edition 2021 / MSRV 1.59 so this no_std static-data
# crate remains usable from older toolchains.
edition = "2021"
repository = "https://github.com/findit-ai/soundevents"
homepage = "https://github.com/findit-ai/soundevents"
documentation = "https://docs.rs/soundevents"
documentation = "https://docs.rs/soundevents-dataset"
description = "Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events."
license = "MIT OR Apache-2.0"
license.workspace = true
repository.workspace = true
homepage.workspace = true
readme = "README.md"
keywords = ["audioset", "sound-events", "ontology", "dataset", "no-std"]
categories = ["data-structures", "multimedia::audio", "no-std", "no-std::no-alloc"]
rust-version = "1.59.0"

[features]
Expand Down
Loading