Add fuzz testing to JSON and EMF formatters by yulnr · Pull Request #237 · awslabs/metrique

yulnr · 2026-03-16T17:00:32Z

Closes #220

Summary

Adds fuzz testing infrastructure for the JSON and EMF formatters.

fuzz_json: Validates that successful formatting produces exactly one valid, newline-terminated JSON object. Exercises both regular and sampled paths.
fuzz_emf: Validates that successful formatting produces valid newline-delimited JSON objects. Exercises both regular and sampled paths, including EMF-specific flag modes.
fuzz_entry: Shared, format-agnostic entry generator that exercises the EntryWriter API. This is now mostly derive-first (Arbitrary) with a few small shaping choices to avoid spending too much time in known low-signal validation failures.

CI

We didn't discuss CI integration, but as a placeholder I'm adding an example setup that runs both targets nightly via with corpus caching and coverage-guided corpus minimization (cargo fuzz cmin).
Edit: we've updated this to do 1-min runs on PRs, and a nightly 5 min run per target without corpus caching.

Misc

A simpler version of this setup is what caught this issue earlier, in a 3 min run: #221

🔏 By submitting this pull request

I confirm that I've made a best effort attempt to update all relevant documentation.
I confirm that my contribution is made under the terms of the Apache 2.0 license.

rcoh

some ways we can improve this but doesn't block. Thanks for taking this on!

rcoh · 2026-03-16T18:51:34Z

+          # Policy:
+          # - one evolving cache lineage per branch per day
+          # - restore from same-day first, then same week, then branch, then OS-wide fallback
+          # - keep corpus in cache (not committed) to avoid repository bloat
+          key: fuzz-corpus-${{ runner.os }}-${{ github.ref_name }}-${{ steps.corpus_bucket.outputs.day }}-${{ github.run_id }}
+          restore-keys: |
+            fuzz-corpus-${{ runner.os }}-${{ github.ref_name }}-${{ steps.corpus_bucket.outputs.day }}-
+            fuzz-corpus-${{ runner.os }}-${{ github.ref_name }}-${{ steps.corpus_bucket.outputs.week }}-
+            fuzz-corpus-${{ runner.os }}-${{ github.ref_name }}-
+            fuzz-corpus-${{ runner.os }}-


do you think we're getting a lot of value out of these corpuses? (basically, should we just check in something once and avoid relying on the github cache?)

I'm actually not sure, for json validation and how we're planning to run this (frequent short runs) we might actually not evolve a super useful corpus. I also considered at first to just check in something now as seed corpus and not cache at all, it sounds like you also think that should be good enough, so I'll simplify this and remove this CI machinery.

rcoh · 2026-03-16T18:55:50Z

+
+impl FuzzTimestamp {
+    pub fn to_system_time(&self) -> SystemTime {
+        // Keep values bounded to avoid pathological durations.


what are these pathological durations?

My bad, not a very useful comment. Initially I put the cap there because I was doing UNIX_EPOCH + duration, which would panic on overflow.

I’ve now switched that to checked_add (which I should've done, since I was already using checked_sub). But after removing the cap, I found another issue. It seems like (at least on macOS) unbounded timestamp values can hit a std time edge case: duration_since(UNIX_EPOCH) can stack-overflow inside std (Timespec::sub_timespec) under ASAN. I can reproduce that with a tiny standalone program using -Zsanitizer=address.

I’ll double-check this tomorrow, but we may still need either a cap or to limit secs to u32. (I'll document it clearer of course)

Update: I believe this was a bug in nightly with SystemTime::duration_since, tracked in #146228, fixed by #146556. After updating nightly I can’t reproduce it, so I removed the bound.

rcoh · 2026-03-16T18:56:33Z

+}
+
+impl<'a> Arbitrary<'a> for FuzzEntry {
+    fn arbitrary(u: &mut Unstructured<'a>) -> arbitrary::Result<Self> {


is there a reason we can't derive arb for this?

No reason that'd prevent it. Initially I went for manual impls for a lot of things due to a mix of necessity (can't derive on foreign types like Unit) and also it gave me a chance to add some biases either towards edge cases or towards valid inputs, in case it'd save us some fuzzing cycles.
But tbh I wasn't super confident those biases were helping, I'm now looking at coverage reports and it seems like we can get away with simplifying by deriving arb for most things (with some caveats, like avoiding too many empty strings which wastes a few mins of fuzzing on failed validations).
I'll find a good spot and update this tomorrow, it should also make this implementation a lot smaller.

yulnr · 2026-03-17T12:44:35Z

Thanks for the feedback! I've simplified the CI (removed corpus caching) and simplified the FuzzEntry setup, making it derive Arbitrary (similar for other types as well).
Also added a 1-min run (per target) on PRs.

Structural validity fuzzing for both formatter backends.

yulnr marked this pull request as ready for review March 16, 2026 17:25

rcoh approved these changes Mar 16, 2026

View reviewed changes

yulnr added 5 commits March 17, 2026 13:44

Add fuzz testing to JSON and EMF formatters

40eee14

Structural validity fuzzing for both formatter backends.

Add 1-minute fuzz run on PRs

c0bd6a0

Use Vec of entries instead of hardcoded pairs in fuzz targets

66ee8ad

Simplify fuzz generators and CI

ac5cebd

Split pr/nightly fuzz workflows

4bfd69b

yulnr force-pushed the test/json-validity-fuzzing branch from 655f274 to 4bfd69b Compare March 17, 2026 12:44

rcoh merged commit 7acdb62 into awslabs:main Mar 17, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fuzz testing to JSON and EMF formatters#237

Add fuzz testing to JSON and EMF formatters#237
rcoh merged 5 commits intoawslabs:mainfrom
yulnr:test/json-validity-fuzzing

yulnr commented Mar 16, 2026 •

edited

Loading

Uh oh!

rcoh left a comment

Uh oh!

rcoh Mar 16, 2026

Uh oh!

yulnr Mar 16, 2026

Uh oh!

Uh oh!

rcoh Mar 16, 2026

Uh oh!

yulnr Mar 16, 2026

Uh oh!

yulnr Mar 17, 2026 •

edited

Loading

Uh oh!

rcoh Mar 16, 2026

Uh oh!

yulnr Mar 16, 2026 •

edited

Loading

Uh oh!

yulnr commented Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yulnr commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

CI

Misc

Uh oh!

rcoh left a comment

Choose a reason for hiding this comment

Uh oh!

rcoh Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

yulnr Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rcoh Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

yulnr Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

yulnr Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcoh Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

yulnr Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yulnr commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yulnr commented Mar 16, 2026 •

edited

Loading

yulnr Mar 17, 2026 •

edited

Loading

yulnr Mar 16, 2026 •

edited

Loading

yulnr commented Mar 17, 2026 •

edited

Loading