Skip to content

Fix race condition in Taxonomy find_or_create_by causing RecordNotUnique#4604

Closed
gumclaw wants to merge 1 commit intomainfrom
gumclaw/fix-taxonomy-duplicate-entry-claude
Closed

Fix race condition in Taxonomy find_or_create_by causing RecordNotUnique#4604
gumclaw wants to merge 1 commit intomainfrom
gumclaw/fix-taxonomy-duplicate-entry-claude

Conversation

@gumclaw
Copy link
Copy Markdown
Contributor

@gumclaw gumclaw commented Apr 20, 2026

What

Override Taxonomy.find_or_create_by and Taxonomy.find_or_create_by! so that concurrent creators on the same (parent_id, slug) no longer raise ActiveRecord::RecordNotUnique. When the DB's unique index rejects an INSERT, the override sticks the connection to the primary and retrieves the now-committed row via find_by(!).

Why

Sentry caught a Mysql2::Error: Duplicate entry 'horns' for key 'index_taxonomies_on_parent_id_and_slug' raised from Onetime::AddNewVrTaxonomies.process (https://gumroad-to.sentry.io/issues/7428194352/). Rails' stock find_or_create_by(!) does a SELECT → INSERT, so two processes can both find nothing and both attempt to INSERT — the unique index catches the loser.

The root cause is the non-atomic find-then-create pattern. Multiple call sites invoke Taxonomy.find_or_create_by! (the onetime service, the dev/staging/test seeds), so the fix belongs on the model — all current and future callers inherit the safety. The follow-up find_by(!) is routed to the primary because a connection pinned to a replica may not yet see the row committed by the racing transaction.

Only 1 occurrence has been captured so far, but because it's a race condition any taxonomy write is susceptible.

Test plan

Added to spec/models/taxonomy_spec.rb:

  • creates the taxonomy when none exists — happy path for a fresh slug
  • returns the existing taxonomy without creating a duplicate — happy path when the row already exists
  • handles concurrent creation for the same slug and parent — spawns two threads that block on a shared queue, release them simultaneously, and verify exactly one row is created, both threads observe the same record, and the closure_tree hierarchy is intact
  • recovers from a RecordNotUnique race by returning the existing record — the find_or_create_by (non-bang) variant

Verified these tests fail when the override is reverted and pass with it applied. spec/services/onetime/add_new_vr_taxonomies_spec.rb continues to pass.

$ bundle exec rspec spec/models/taxonomy_spec.rb spec/services/onetime/add_new_vr_taxonomies_spec.rb
13 examples, 0 failures

AI disclosure: authored by Claude Opus 4.6. Prompts: investigate the Sentry issue linked above, identify the root cause in app/models/taxonomy.rb and its callers, pick the cleanest fix (rescue RecordNotUnique at the model level so every caller benefits), add a concurrent-creation spec that actually exercises the race, and open a focused PR on gumclaw/fix-taxonomy-duplicate-entry-claude.

Taxonomy.find_or_create_by(!) performed a SELECT followed by an INSERT
with no handling for concurrent creators. When two processes raced on
the same (parent_id, slug), the unique index
index_taxonomies_on_parent_id_and_slug rejected the second INSERT and
raised ActiveRecord::RecordNotUnique.

Override find_or_create_by(!) on Taxonomy to rescue RecordNotUnique and
return the existing record via find_by(!). Route the follow-up read to
the primary after the race since the replica may not yet have the
committed row.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +110 to +126
expect(errors).to be_empty
expect(taxonomies.map(&:id).uniq.size).to eq(1)

taxonomy = Taxonomy.find_by!(slug: "race-slug", parent:)

expect(Taxonomy.where(slug: "race-slug", parent:).count).to eq(1)
expect(taxonomy.parent).to eq(parent)
expect(taxonomy.self_and_ancestors.map(&:id)).to match_array([parent.id, taxonomy.id])
end
end

describe ".find_or_create_by" do
let!(:parent) { Taxonomy.create!(slug: "parent-slug") }

it "recovers from a RecordNotUnique race by returning the existing record" do
existing = Taxonomy.create!(slug: "race-slug-nonbang", parent:)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The test 'recovers from a RecordNotUnique race by returning the existing record' stubs Taxonomy.create_or_find_by to raise RecordNotUnique, but the actual rescue path lives inside Taxonomy.find_or_create_by -> super, which calls Rails' find_or_create_by (SELECT-then-INSERT) — an entirely different method from create_or_find_by (INSERT-then-rescue). The stub is never invoked, the rescue block is never triggered, and the test would still pass if the entire rescue block were deleted from the model.

Extended reasoning...

What the bug is

Rails 7.1 provides two distinct methods: find_or_create_by (SELECT → INSERT) and create_or_find_by (INSERT → rescue RecordNotUnique → SELECT). They are not aliases and do not call each other. The model override in app/models/taxonomy.rb is on find_or_create_by, which calls super to invoke Rails' own find_or_create_by implementation. Stubs on create_or_find_by have zero effect on that code path.

The specific code path that triggers it

When Taxonomy.find_or_create_by(slug: "race-slug-nonbang", parent:) is called:

  1. The model's override receives control and calls super
  2. super invokes ActiveRecord::Base.find_or_create_by, which immediately calls find_by(attributes)
  3. Because existing was pre-created before the stub was even set up, find_by returns it immediately
  4. The method returns—no INSERT is attempted, no RecordNotUnique is raised
  5. The rescue ActiveRecord::RecordNotUnique block in the model is never reached

Why existing code doesn't prevent it

The stub allow(Taxonomy).to receive(:create_or_find_by).and_raise(ActiveRecord::RecordNotUnique) targets a method that is simply never called during this execution. RSpec will report zero invocations of the stub, yet the test passes because the happy path (pre-existing row found via SELECT) already satisfies both expect { ... }.not_to raise_error and expect(result).to eq(existing).

Impact

The rescue branch of the non-bang find_or_create_by override has zero effective test coverage. The PR description claims this variant is covered, giving false confidence. Any future refactor that breaks the rescue path (e.g., deleting the block, changing the method called, removing stick_to_primary!) would not be caught by this test.

Step-by-step proof that the rescue block is untouched

  1. existing = Taxonomy.create!(slug: "race-slug-nonbang", parent:) → row is now in the DB
  2. Stub is registered on Taxonomy.create_or_find_by — never called by find_or_create_by
  3. Taxonomy.find_or_create_by(slug: "race-slug-nonbang", parent:) → calls super
  4. Rails' find_or_create_by: executes find_by(slug: "race-slug-nonbang", parent_id: <id>) → finds existing, returns immediately
  5. rescue block in taxonomy.rb:17 is skipped entirely
  6. Delete lines 16–19 of taxonomy.rb (rescue...find_by) → re-run the test → still green

How to fix it

To actually exercise the rescue path, the test must force an INSERT to happen and then fail with RecordNotUnique. One approach: stub find_by to return nil on the first call (so the SELECT appears to miss) and stub create (or the underlying super call) to raise RecordNotUnique, then stub the recovery find_by to return existing. Alternatively, mirror the bang-variant's concurrent-thread approach, which genuinely exercises the race.

Comment on lines +62 to +64

describe ".find_or_create_by!" do
let!(:parent) { Taxonomy.create!(slug: "parent-slug") }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The four new test cases use Taxonomy.create\! directly instead of FactoryBot factories, violating the CONTRIBUTING.md guideline ('Use factories for test data instead of creating objects directly'). Replace each Taxonomy.create\!(slug: ...) call with create(:taxonomy, slug: ...) to match the pattern used throughout the rest of this file.

Extended reasoning...

What the bug is: The new tests added in this PR use Taxonomy.create\! to instantiate test records directly, bypassing the FactoryBot layer. There are four affected call sites: two let\!(:parent) { Taxonomy.create\!(slug: "parent-slug") } blocks (one per describe group), existing = Taxonomy.create\!(slug: "already-there", parent:), and existing = Taxonomy.create\!(slug: "race-slug-nonbang", parent:).

The specific code path: In spec/models/taxonomy_spec.rb lines 62–64 (and mirrored in the second describe block), the setup code invokes Taxonomy.create\! rather than the project-standard create(:taxonomy, ...) factory helper.

Why existing code doesn't prevent it: Rails (and RSpec) allows both patterns to work functionally. There is no linter or enforced FactoryBot-only constraint in the test suite, so the tests pass regardless. The violation is purely a convention/style issue.

Impact: CONTRIBUTING.md (referenced from CLAUDE.md) explicitly mandates: 'Use factories for test data instead of creating objects directly.' Direct model creation bypasses factory defaults, callbacks, and traits that are maintained centrally at spec/support/factories/taxonomies.rb. This means future changes to the factory (e.g., adding a required field with a default) won't automatically propagate to these tests. It also creates an inconsistency within the same file — lines 35 and 52 already use create(:taxonomy, ...).

Step-by-step proof:

  1. spec/support/factories/taxonomies.rb defines a :taxonomy factory with a sequenced slug.
  2. Existing tests (e.g., line 35: let\!(:existing_taxonomy) { create(:taxonomy, slug: "example", parent:) }) use this factory.
  3. New tests instead call Taxonomy.create\!(slug: "parent-slug") — a direct ActiveRecord instantiation.
  4. CONTRIBUTING.md line 45 reads: 'Use factories for test data instead of creating objects directly.'
  5. Therefore these four calls are direct violations of the documented convention.

How to fix: Replace each Taxonomy.create\!(slug: "...", ...) call with create(:taxonomy, slug: "...", ...). The factory already accepts a slug override, so no factory changes are needed.

@slavingia slavingia closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants