Skip to content

0.2.0: support batch processing#3

Merged
al8n merged 7 commits intomainfrom
0.2.0
Apr 8, 2026
Merged

0.2.0: support batch processing#3
al8n merged 7 commits intomainfrom
0.2.0

Conversation

@al8n
Copy link
Copy Markdown
Contributor

@al8n al8n commented Apr 8, 2026

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds batch-oriented inference APIs and batched chunk processing to the soundevents classifier, while also updating crate metadata/licensing and improving doctest coverage across the workspace for the 0.2.0 release.

Changes:

  • Add equal-length batch inference APIs (predict_raw_scores_batch*, classify_*_batch) and use batched model calls for chunked inference via ChunkingOptions::batch_size.
  • Tighten model output validation for batched outputs and add regression tests comparing batched vs sequential inference.
  • Update crate metadata (workspace-inherited fields, docs.rs config, include/readme/keywords/categories), add packaged license files, and convert README snippets to compile-checked doctests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
soundevents/src/lib.rs Implements batched inference + chunk batching, adds new errors/validation, and adds tests.
soundevents/Cargo.toml Bumps to 0.2.0 and improves publish/docs.rs metadata.
soundevents/LICENSE-MIT Adds MIT license file to crate package contents.
soundevents/LICENSE-APACHE Adds Apache 2.0 license file to crate package contents.
soundevents-dataset/src/lib.rs Removes deny(warnings) from crate-level lint settings.
soundevents-dataset/README.md Updates badges/links and makes examples doctest-friendly.
soundevents-dataset/LICENSE-MIT Adds MIT license file to crate package contents.
soundevents-dataset/LICENSE-APACHE Adds Apache 2.0 license file to crate package contents.
soundevents-dataset/Cargo.toml Bumps to 0.2.0 and improves publish metadata while keeping edition/MSRV intent.
README.md Documents new batching and updates examples to no_run doctests.
CHANGELOG.md Reworks changelog format and adds 0.2.0/0.1.0 entries.
Cargo.toml Introduces [workspace.package] and bumps workspace dependency version for soundevents-dataset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
@@ -33,9 +34,12 @@ Production-oriented Rust inference for [CED](https://arxiv.org/abs/2308.11957) A
soundevents = "0.1"
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README still recommends soundevents = "0.1", but this PR bumps the crate to 0.2.0. Using "0.1" will prevent users from getting the new batch APIs and metadata changes. Update the version in the Quick start snippet to 0.2 (or an explicit 0.2.0).

Suggested change
soundevents = "0.1"
soundevents = "0.2"

Copilot uses AI. Check for mistakes.
Comment thread README.md Outdated
@@ -119,10 +128,18 @@ Enable the `bundled-tiny` feature to embed `models/tiny.onnx` into your binary
soundevents = { version = "0.1", features = ["bundled-tiny"] }
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bundled model example still pins soundevents = { version = "0.1", ... } even though the crate version is now 0.2.0. This will keep pulling the pre-0.2 release. Update this dependency snippet to 0.2/0.2.0.

Suggested change
soundevents = { version = "0.1", features = ["bundled-tiny"] }
soundevents = { version = "0.2", features = ["bundled-tiny"] }

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md
Comment on lines +29 to +33
- Initial public release of the ONNX Runtime wrapper for CED AudioSet classifiers.
- Added file, memory, and bundled-model loading paths plus configurable graph optimization.
- Added ranked top-k helpers, raw-score accessors, and chunked inference with mean/max aggregation.
- Added equal-length batch APIs for clip inference and chunked window batching for higher-throughput services.

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0.1.0 changelog entry claims “Added equal-length batch APIs…”, but those APIs are introduced in this PR alongside the 0.2.0 bump. This makes the release notes misleading; consider moving that bullet into 0.2.0 (or removing it from 0.1.0) so each version’s section reflects what shipped in that release.

Copilot uses AI. Check for mistakes.
Comment thread soundevents/src/lib.rs Outdated
Comment on lines +458 to +461
let total_samples = batch_size * chunk_len;

self.input_scratch.clear();
self.input_scratch.reserve(total_samples);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_samples is computed as batch_size * chunk_len, which can overflow usize in release builds (wrapping silently) and lead to incorrect reserve() sizing and potential panics/UB-like behavior during extend_from_slice. Use checked_mul (and ideally also guard the subsequent reserve) and return a structured error if the requested batch is too large to pack safely.

Suggested change
let total_samples = batch_size * chunk_len;
self.input_scratch.clear();
self.input_scratch.reserve(total_samples);
let total_samples = batch_size
.checked_mul(chunk_len)
.ok_or(ClassifierError::BatchTooLarge {
batch_size,
chunk_len,
})?;
self.input_scratch.clear();
self
.input_scratch
.try_reserve(total_samples)
.map_err(|_| ClassifierError::BatchTooLarge {
batch_size,
chunk_len,
})?;

Copilot uses AI. Check for mistakes.
Comment thread soundevents/LICENSE-MIT Outdated
@@ -0,0 +1,25 @@
Copyright (c) 2015 The Rust Project Developers
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MIT license text here is copied verbatim from the Rust Project template (“Copyright (c) 2015 The Rust Project Developers”), but the crate README elsewhere asserts copyright by the FinDIT authors. If this crate is not actually copyrighted by the Rust Project Developers, update the copyright line (or use a generic MIT template) so the packaged license accurately reflects the project’s copyright holders.

Suggested change
Copyright (c) 2015 The Rust Project Developers
Copyright (c) [year] [copyright holders]

Copilot uses AI. Check for mistakes.
Comment thread soundevents-dataset/LICENSE-MIT Outdated
@@ -0,0 +1,25 @@
Copyright (c) 2015 The Rust Project Developers
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MIT license text here uses the Rust Project’s copyright header (“Copyright (c) 2015 The Rust Project Developers”). If soundevents-dataset is copyrighted by your project authors (as indicated in the README), this should be replaced with the correct copyright notice (or a placeholder MIT template) before publishing.

Suggested change
Copyright (c) 2015 The Rust Project Developers
Copyright (c) [year] [copyright holders]

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread soundevents/src/lib.rs
Comment on lines +849 to +856
std::iter::from_fn(move || {
let first = chunks.next()?;
let first_len = first.len();
let mut batch = Vec::with_capacity(options.batch_size());
batch.push(first);

while batch.len() < options.batch_size() {
let Some(next_len) = chunks.peek().map(|chunk| chunk.len()) else {
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_batches preallocates Vec::with_capacity(options.batch_size()). Because batch_size is fully caller-controlled and only validated as non-zero, a very large value can cause a huge allocation (or abort) even for small clips. Consider not preallocating, or capping the capacity to a reasonable upper bound / remaining chunks.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
@@ -119,10 +130,18 @@ Enable the `bundled-tiny` feature to embed `models/tiny.onnx` into your binary
soundevents = { version = "0.1", features = ["bundled-tiny"] }
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bundled-tiny dependency snippet still uses version = "0.1", but this PR bumps the crate to 0.2.0 and earlier examples already use soundevents = "0.2". Please update this snippet to 0.2 to avoid readers copying an outdated version.

Suggested change
soundevents = { version = "0.1", features = ["bundled-tiny"] }
soundevents = { version = "0.2", features = ["bundled-tiny"] }

Copilot uses AI. Check for mistakes.
Comment thread soundevents/src/lib.rs
std::iter::from_fn(move || {
let first = chunks.next()?;
let first_len = first.len();
let mut batch = Vec::with_capacity(options.batch_size());
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_batches preallocates with Vec::with_capacity(options.batch_size()). Since batch_size is user-controlled and only validated as non-zero, an extremely large value can trigger an attempted huge allocation and panic/abort before any inference runs. Consider avoiding prealloc based on batch_size (e.g., start with an empty Vec and let it grow to the actual batch length, or cap the preallocation to a safe upper bound).

Suggested change
let mut batch = Vec::with_capacity(options.batch_size());
let mut batch = Vec::new();

Copilot uses AI. Check for mistakes.
Comment thread soundevents/src/lib.rs Outdated
Comment on lines +557 to +570
/// Classify a batch of equal-length mono 16 kHz clips and return the top `k` classes for each clip.
pub fn classify_batch(
&mut self,
batch_16k: &[&[f32]],
top_k: usize,
) -> Result<Vec<Vec<EventPrediction>>, ClassifierError> {
let _ = validate_batch_inputs(batch_16k)?;

if top_k == 0 {
return Ok((0..batch_16k.len()).map(|_| Vec::new()).collect());
}

self.with_raw_scores_batch(batch_16k, |raw_scores, batch_size| {
raw_scores
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classify_batch calls validate_batch_inputs(batch_16k)? and then immediately calls with_raw_scores_batch, which re-validates the same batch again. This doubles the input scan cost for large batches; consider restructuring so validation happens only once (while still short-circuiting when top_k == 0).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread soundevents/src/lib.rs Outdated
Comment on lines +833 to +839
let expected_values = expected_batch_size.saturating_mul(NUM_CLASSES);
if raw_scores.len() != expected_values {
if raw_scores.len() % expected_batch_size.max(1) == 0 {
return Err(ClassifierError::UnexpectedClassCount {
expected: NUM_CLASSES,
actual: raw_scores.len() / expected_batch_size.max(1),
});
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_output uses saturating_mul to compute the expected element count. If expected_batch_size * NUM_CLASSES overflows, this will silently clamp and can yield confusing errors (and potentially accept a nonsensical expected_values). Consider using the existing checked_batch_len(expected_batch_size, NUM_CLASSES) here and returning ClassifierError::BatchTooLarge on overflow, and then use that checked value for the length comparison.

Copilot uses AI. Check for mistakes.
Comment thread soundevents/src/lib.rs
Comment on lines +897 to +903
let first = chunks.next()?;
let first_len = first.len();
let mut batch = Vec::with_capacity(options.batch_size());
batch.push(first);

while batch.len() < options.batch_size() {
let Some(next_len) = chunks.peek().map(|chunk| chunk.len()) else {
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_batches pre-allocates with Vec::with_capacity(options.batch_size()). Since batch_size is user-controlled, very large values can trigger a huge allocation (panic/OOM) even for short inputs. Prefer not pre-allocating from the raw batch size (e.g., start with an empty vec and let it grow), or cap/guard the capacity via a checked/try_reserve path that maps to ClassifierError::BatchTooLarge.

Copilot uses AI. Check for mistakes.
Comment thread README.md
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README examples use a repo-relative path ("soundevents/models/tiny.onnx") when calling Classifier::from_file. For crates.io users this path typically won’t exist at runtime, so the snippet is likely to fail when copied. Consider switching the example to a placeholder like "path/to/model.onnx" (and/or pointing to Classifier::tiny behind bundled-tiny) so the quick start is correct outside of a git checkout.

Suggested change
let mut classifier = Classifier::from_file("soundevents/models/tiny.onnx")?;
let mut classifier = Classifier::from_file("path/to/model.onnx")?;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread soundevents/src/lib.rs
std::iter::from_fn(move || {
let first = chunks.next()?;
let first_len = first.len();
let mut batch = Vec::with_capacity(options.batch_size());
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_batches pre-allocates with Vec::with_capacity(options.batch_size()), but batch_size is user-controlled. A very large batch size can trigger an immediate huge allocation / OOM even if only a few chunks are actually present. Consider avoiding capacity preallocation based on batch_size (or capping it to a reasonable maximum) so chunked inference can fail gracefully rather than risking an allocation spike.

Suggested change
let mut batch = Vec::with_capacity(options.batch_size());
let mut batch = Vec::new();

Copilot uses AI. Check for mistakes.
@al8n al8n merged commit b655761 into main Apr 8, 2026
16 checks passed
@al8n al8n deleted the 0.2.0 branch April 8, 2026 12:21
al8n pushed a commit that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants