Skip to content

Fix duplicate document error during database migration#157

Open
claudear wants to merge 1 commit intomainfrom
fix/handle-duplicate-document-exception
Open

Fix duplicate document error during database migration#157
claudear wants to merge 1 commit intomainfrom
fix/handle-duplicate-document-exception

Conversation

@claudear
Copy link

@claudear claudear commented Mar 11, 2026

Summary

  • When batch-inserting documents via createDocuments(), a single duplicate document causes the entire batch to fail with "Document already exists"
  • Added a catch (DuplicateException) that falls back to inserting documents one-by-one via createDocument(), skipping any that already exist
  • This handles the case where a migration is retried or documents already exist from a previous partial migration

Test plan

  • Added unit test verifying DuplicateException from batch insert triggers one-by-one fallback
  • Added unit test verifying successful batch insert does not trigger fallback
  • Verified test fails without the fix (TDD)
  • All existing unit tests pass
  • Linter (pint) passes
  • Static analysis (phpstan) passes

Fixes: https://appwrite.sentry.io/issues/7289696520/

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced duplicate document handling during data migration to Appwrite. Failed batch operations now gracefully fall back to individual document insertion while skipping duplicates, improving sync reliability.
  • Tests

    • Added comprehensive unit tests for duplicate document scenarios in Appwrite migrations.

When batch inserting documents via createDocuments, a single duplicate
causes the entire batch to fail. This catches DuplicateException and
falls back to one-by-one insertion, skipping documents that already
exist instead of failing the whole batch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Walkthrough

This pull request introduces a fallback mechanism for batch document insertion in Appwrite. The primary change wraps an existing batch createDocuments call in a try/catch block. If the batch insert succeeds, behavior remains unchanged. If it fails with DuplicateException, the code falls back to inserting documents individually via createDocument, skipping any rows that cause duplicate exceptions. A new unit test file is added to verify both the successful batch path and the fallback path when duplicates are encountered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding error handling for duplicate documents during Appwrite batch migration operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/handle-duplicate-document-exception

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/Migration/Unit/Destinations/AppwriteTest.php (2)

126-148: Assert that the happy path actually calls createDocuments().

Right now this only proves createDocument() was not used. It would still pass if createRecord() returned early before writing anything, so the batch call itself should be an explicit expectation.

✅ Lock in the batch-success path
-        $dbForDatabases->method('createDocuments')
+        $dbForDatabases->expects($this->once())
+            ->method('createDocuments')
             ->willReturn(0);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/Migration/Unit/Destinations/AppwriteTest.php` around lines 126 - 148,
The test testCreateRecordBatchSucceeds currently only asserts createDocument()
is never called; add an explicit expectation on the $dbForDatabases mock that
createDocuments() is invoked once (or with the expected arguments) when calling
Appwrite::createRecord($row, true) to ensure the batch path is exercised; update
the $dbForDatabases mock setup to expect createDocuments() (instead of only
stubbing its return) and keep the createDocument() never() expectation,
referencing the createRecord method on Appwrite and the
createDocuments/createDocument mock methods.

82-121: This test doesn't prove the batch path was attempted or that fallback continues past a duplicate.

Because createDocuments() is only stubbed, a regression to “always insert one-by-one” would still pass. And with only two rows, a break/early-return on the duplicate would also pass because there is no third row to verify continuation.

✅ Tighten the duplicate-path test
-        $dbForDatabases->method('createDocuments')
+        $dbForDatabases->expects($this->once())
+            ->method('createDocuments')
             ->willThrowException(new DuplicateException('Document already exists'));

         // Fallback createDocument: first succeeds, second throws duplicate (skipped)
         $createDocumentCallCount = 0;
         $dbForDatabases->method('createDocument')
@@
         $row1 = new Row('row1', $table, ['field1' => 'value1']);
         $row2 = new Row('row2', $table, ['field1' => 'value2']);
+        $row3 = new Row('row3', $table, ['field1' => 'value3']);

         // Buffer row1 (not last)
         $result1 = $method->invoke($appwrite, $row1, false);
         $this->assertTrue($result1);

-        // Buffer row2 and flush (isLast=true) - should NOT throw
-        $result2 = $method->invoke($appwrite, $row2, true);
+        $result2 = $method->invoke($appwrite, $row2, false);
         $this->assertTrue($result2);
+
+        // Flush on row3 - should continue after row2 duplicate
+        $result3 = $method->invoke($appwrite, $row3, true);
+        $this->assertTrue($result3);

         // Verify fallback was used
-        $this->assertEquals(2, $createDocumentCallCount);
+        $this->assertEquals(3, $createDocumentCallCount);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/Migration/Unit/Destinations/AppwriteTest.php` around lines 82 - 121,
The test testCreateRecordHandlesDuplicateDocuments should ensure the batch path
was attempted and that fallback continues after a duplicate: update the test to
(1) assert createDocuments was invoked (or set the mock to expect a call) on the
$dbForDatabases mock, and (2) exercise at least three rows (e.g., row1 buffered,
row2 buffered, row3 flush with isLast=true) and adjust the createDocument
callback to throw DuplicateException for only one of the rows so you can assert
the fallback call count continues past the duplicate (e.g., assert
createDocumentCallCount === 3). Ensure you reference the mocked methods
createDocuments and createDocument and the Appwrite::createRecord invocation
when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Migration/Destinations/Appwrite.php`:
- Around line 1072-1089: The per-row DuplicateException in the fallback loop
over $this->rowBuffer currently just swallows duplicates, causing the migration
to report them as successful; modify the inner catch (DuplicateException) block
inside the skipRelationshipsExistCheck(fn () =>
$dbForDatabases->createDocument(...)) loop to mark the corresponding Row
resource as STATUS_SKIPPED using the same update routine/logic used in the
explicit duplicate branch (i.e., the code that updates the Row status to
STATUS_SKIPPED when a duplicate is detected earlier), so the migration
cache/results accurately reflect skipped duplicates.

---

Nitpick comments:
In `@tests/Migration/Unit/Destinations/AppwriteTest.php`:
- Around line 126-148: The test testCreateRecordBatchSucceeds currently only
asserts createDocument() is never called; add an explicit expectation on the
$dbForDatabases mock that createDocuments() is invoked once (or with the
expected arguments) when calling Appwrite::createRecord($row, true) to ensure
the batch path is exercised; update the $dbForDatabases mock setup to expect
createDocuments() (instead of only stubbing its return) and keep the
createDocument() never() expectation, referencing the createRecord method on
Appwrite and the createDocuments/createDocument mock methods.
- Around line 82-121: The test testCreateRecordHandlesDuplicateDocuments should
ensure the batch path was attempted and that fallback continues after a
duplicate: update the test to (1) assert createDocuments was invoked (or set the
mock to expect a call) on the $dbForDatabases mock, and (2) exercise at least
three rows (e.g., row1 buffered, row2 buffered, row3 flush with isLast=true) and
adjust the createDocument callback to throw DuplicateException for only one of
the rows so you can assert the fallback call count continues past the duplicate
(e.g., assert createDocumentCallCount === 3). Ensure you reference the mocked
methods createDocuments and createDocument and the Appwrite::createRecord
invocation when making these changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d34308c-6939-401d-8cd4-5d295d5289c8

📥 Commits

Reviewing files that changed from the base of the PR and between bbdd8ef and 1e76727.

📒 Files selected for processing (2)
  • src/Migration/Destinations/Appwrite.php
  • tests/Migration/Unit/Destinations/AppwriteTest.php

Comment on lines +1072 to +1089
try {
$dbForDatabases->skipRelationshipsExistCheck(fn () => $dbForDatabases->createDocuments(
$collectionId,
$this->rowBuffer
));
} catch (DuplicateException) {
// Batch insert failed due to a duplicate document.
// Fall back to inserting one-by-one, skipping duplicates.
foreach ($this->rowBuffer as $row) {
try {
$dbForDatabases->skipRelationshipsExistCheck(fn () => $dbForDatabases->createDocument(
$collectionId,
$row
));
} catch (DuplicateException) {
// Document already exists, skip it
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback duplicates are reported as success, not skipped.

This loop only has buffered UtopiaDocuments, so when a duplicate is skipped here there is no way to update the matching Row resource back to STATUS_SKIPPED like the explicit duplicate branch at Lines 989-994. In practice, the migration cache/result will report these rows as successfully imported even though they were only ignored as pre-existing.

💡 Sketch of a fix
-    /**
-     * `@var` array<UtopiaDocument>
-     */
-    private array $rowBuffer = [];
+    /**
+     * `@var` array<array{resource: Row, document: UtopiaDocument}>
+     */
+    private array $rowBuffer = [];

-        $this->rowBuffer[] = new UtopiaDocument(\array_merge([
-            '$id' => $resource->getId(),
-            '$permissions' => $resource->getPermissions(),
-        ], $data));
+        $this->rowBuffer[] = [
+            'resource' => $resource,
+            'document' => new UtopiaDocument(\array_merge([
+                '$id' => $resource->getId(),
+                '$permissions' => $resource->getPermissions(),
+            ], $data)),
+        ];

-                    $dbForDatabases->skipRelationshipsExistCheck(fn () => $dbForDatabases->createDocuments(
-                        $collectionId,
-                        $this->rowBuffer
-                    ));
+                    $dbForDatabases->skipRelationshipsExistCheck(fn () => $dbForDatabases->createDocuments(
+                        $collectionId,
+                        \array_column($this->rowBuffer, 'document')
+                    ));

-                    foreach ($this->rowBuffer as $row) {
+                    foreach ($this->rowBuffer as $entry) {
                         try {
                             $dbForDatabases->skipRelationshipsExistCheck(fn () => $dbForDatabases->createDocument(
                                 $collectionId,
-                                $row
+                                $entry['document']
                             ));
                         } catch (DuplicateException) {
-                            // Document already exists, skip it
+                            $entry['resource']->setStatus(
+                                Resource::STATUS_SKIPPED,
+                                'Row has already been created'
+                            );
+                            $this->cache->update($entry['resource']);
                         }
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Migration/Destinations/Appwrite.php` around lines 1072 - 1089, The
per-row DuplicateException in the fallback loop over $this->rowBuffer currently
just swallows duplicates, causing the migration to report them as successful;
modify the inner catch (DuplicateException) block inside the
skipRelationshipsExistCheck(fn () => $dbForDatabases->createDocument(...)) loop
to mark the corresponding Row resource as STATUS_SKIPPED using the same update
routine/logic used in the explicit duplicate branch (i.e., the code that updates
the Row status to STATUS_SKIPPED when a duplicate is detected earlier), so the
migration cache/results accurately reflect skipped duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant