SPOC-481: Add DISCARDFILE support for TRANSDISCARD exception handling by danolivo · Pull Request #393 · pgEdge/spock

danolivo · 2026-03-17T13:36:50Z

Summary

In TRANSDISCARD mode, instead of executing each DML in a subtransaction during replay and logging to exception_log, records are now dumped
directly to a per-database binary file (pg_spock/discard_<dboid>.log).
This follows the Oracle GoldenGate discard file approach: on error,
skip re-execution and just record what was discarded.
discardfile_write() serialises records as length-prefixed JSON with a
file format version header, using an atomic counter for unique IDs and
a dedicated LWLock for concurrent append safety. Returns true/false to
signal success or failure (with WARNING).
spock.discard_read() SQL function reads the binary file back as a set
of jsonb records, e.g.:
SELECT rec->>'node_name', rec->>'operation', rec->'remote_tuple'
FROM spock.discard_read() AS rec
ORDER BY (rec->>'id')::bigint;
TRANSDISCARD/SUB_DISABLE paths in handle_insert() are wired to use
The discard file instead of subtransaction replay.

Design details

File format: [uint32 version][uint32 len][json bytes]... — version
header enables future format upgrades, length prefix makes reading
simple and predictable.
JSON field names are defined via DF_* constants shared between writer
and reader.
Shared memory: pg_atomic_uint64 discard_id and LWLock *discard_file_lock
added to SpockContext.

What's next

Further commits should wire discardfile_write() into handle_update() and
handle_delete() replay paths.
We have some tests in the replication_set, but more profound tests should be added later.
We may redesign the read interface until the 5.1 release.

coderabbitai · 2026-03-17T13:37:32Z

📝 Walkthrough

Walkthrough

Adds a discard-file subsystem: a writer API prototype, a SQL reader function, an LWLock in SpockContext for concurrent appends (initialized in shmem), integrate discard logging into apply-worker insert handling, and add test diagnostics that read the discard file.

Changes

Cohort / File(s)	Summary
Public header `include/spock_exception_handler.h`	Added `extern bool discardfile_write(...);` declaration to expose the discard-file write API.
Spock context / shmem `include/spock_worker.h`, `src/spock_shmem.c`	Added `LWLock *discard_file_lock` to `SpockContext`; request/init a dedicated LWLock tranche and wire `discard_file_lock` during shared-memory startup.
Discard-file implementation & reader `src/spock_exception_handler.c`	Implemented discard-file subsystem: directory/path helpers, versioned newline-JSON record format, `discardfile_write` appending records under exclusive lock with header handling and warnings on I/O errors, and SQL-callable `spock_discard_read()` returning `SETOF jsonb`. Added required includes and PG_FUNCTION_INFO_V1.
Apply worker changes `src/spock_apply.c`	Refactored INSERT handling when `use_try_block` is true: for TRANSDISCARD/SUB_DISABLE call `discardfile_write` instead of performing the subtransaction insert; other behaviours use conditional PG_TRY/PG_CATCH with adjusted subtransaction lifecycle and logging.
SQL extension `sql/spock--6.0.0-devel.sql`	Added `CREATE FUNCTION spock.discard_read() RETURNS SETOF jsonb ...` binding to C symbol `spock_discard_read`.
Tests (diagnostics) `tests/regress/sql/replication_set.sql`	Added three diagnostic blocks that call `spock.discard_read()` to display discard records for verification.

Poem

🐇 I tuck the lost rows in a hidden trail,

I stamp each record with a quiet key,
Locked and numbered in a snug JSON pail,
A rabbit's ledger of what used to be,
Hop — the discard file keeps our memory.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main change: adding DISCARDFILE support for TRANSDISCARD exception handling, which is the primary objective of the PR.
Description check	✅ Passed	The description comprehensively explains the DISCARDFILE feature, its design, implementation details, and next steps, all directly related to the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spoc-481

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

src/spock_apply.c (1)

1346-1350: Remove unreachable exception-behaviour branch in DISCARD-only path.

At Line 1346-Line 1348, the TRANSDISCARD/SUB_DISABLE check is dead code because this block only executes in the else branch where those modes were already excluded.

♻️ Proposed refactor

-			if (!failed)
-			{
-				if (exception_behaviour == TRANSDISCARD ||
-					exception_behaviour == SUB_DISABLE)
-					RollbackAndReleaseCurrentSubTransaction();
-				else
-					ReleaseCurrentSubTransaction();
-			}
+			if (!failed)
+				ReleaseCurrentSubTransaction();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1346 - 1350, The conditional checking
exception_behaviour for TRANSDISCARD or SUB_DISABLE inside the DISCARD-only
execution path is unreachable; remove the TRANSDISCARD/SUB_DISABLE branch and
simplify the code to always call ReleaseCurrentSubTransaction() in that branch
instead of conditionally calling RollbackAndReleaseCurrentSubTransaction();
update the block containing exception_behaviour,
RollbackAndReleaseCurrentSubTransaction(), and ReleaseCurrentSubTransaction() so
it unconditionally invokes ReleaseCurrentSubTransaction() where the DISCARD-only
path is taken.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sql/spock--6.0.0-devel.sql`:
- Around line 101-102: The function declaration for spock.discard_read
incorrectly uses STABLE even though its C implementation (spock_discard_read)
reads mutable on-disk state; change the function volatility from STABLE to
VOLATILE in the CREATE OR REPLACE FUNCTION spock.discard_read() statement so the
planner won't assume constant results for identical arguments and to match the
behavior of the C function (spock_discard_read exported via MODULE_PATHNAME).

In `@src/spock_apply.c`:
- Around line 1322-1325: The discardfile_write() return value is ignored and
failed is unconditionally set to false; change the call in spock_apply.c so you
capture its return (e.g., int rc = discardfile_write(...)) and only set failed =
false when rc indicates success, otherwise leave failed true and handle the
error path (log the failure and propagate/return error or jump to the existing
cleanup/error handling path) so discard audit writes are not silently treated as
successful; update the block that references discardfile_write, failed,
local_node->node->name, rel, remote_origin_id, newtup, and remote_xid
accordingly.

In `@src/spock_exception_handler.c`:
- Around line 358-372: The JSON being built with appendStringInfo in
spock_exception_handler.c injects raw values (node_name,
timestamptz_to_str(...), quote_qualified_identifier(...), and operation) and
must be JSON-escaped to avoid breaking records; update the appendStringInfo call
to pass JSON-escaped versions of these strings (create/use a helper like
json_escape_string or escape_json and call it on node_name, the timestamp
string, the result of quote_qualified_identifier(rel->nspname, rel->relname),
and operation before formatting) so the produced JSON is always valid and
parseable by spock_discard_read().

---

Nitpick comments:
In `@src/spock_apply.c`:
- Around line 1346-1350: The conditional checking exception_behaviour for
TRANSDISCARD or SUB_DISABLE inside the DISCARD-only execution path is
unreachable; remove the TRANSDISCARD/SUB_DISABLE branch and simplify the code to
always call ReleaseCurrentSubTransaction() in that branch instead of
conditionally calling RollbackAndReleaseCurrentSubTransaction(); update the
block containing exception_behaviour, RollbackAndReleaseCurrentSubTransaction(),
and ReleaseCurrentSubTransaction() so it unconditionally invokes
ReleaseCurrentSubTransaction() where the DISCARD-only path is taken.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70a6fc93-9889-4844-9bf2-701f24c73413

📥 Commits

Reviewing files that changed from the base of the PR and between 0ae4205 and 938f3bf.

⛔ Files ignored due to path filters (1)

tests/regress/expected/replication_set.out is excluded by !**/*.out

📒 Files selected for processing (7)

include/spock_exception_handler.h
include/spock_worker.h
sql/spock--6.0.0-devel.sql
src/spock_apply.c
src/spock_exception_handler.c
src/spock_shmem.c
tests/regress/sql/replication_set.sql

sql/spock--6.0.0-devel.sql

src/spock_apply.c

src/spock_exception_handler.c

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/spock_apply.c (1)

1324-1326: ⚠️ Potential issue | 🟠 Major

Handle discardfile_write() failure before marking operation successful.

Line 1324 ignores the return value and Line 1326 forces success. If the file write fails, the discard record is silently lost.

Proposed fix

-			discardfile_write(local_node->node->name, rel, remote_origin_id,
-							  "INSERT", NULL, &newtup, remote_xid);
-			failed = false;
+			if (!discardfile_write(local_node->node->name, rel, remote_origin_id,
+								   "INSERT", NULL, &newtup, remote_xid))
+				ereport(ERROR,
+						(errmsg("failed to write discard record for INSERT on %s.%s",
+								rel->nspname, rel->relname)));
+			failed = false;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1324 - 1326, The discardfile_write call's
return value is ignored and failed is set to false unconditionally; update the
code around discardfile_write (the call using local_node->node->name, rel,
remote_origin_id, "INSERT", NULL, &newtup, remote_xid) to check its return
value, only set failed = false when discardfile_write reports success, and set
failed = true (and log an error via existing logging facilities) when it fails
so the operation isn't marked successful when the discard record couldn't be
written.

src/spock_exception_handler.c (1)

421-422: ⚠️ Potential issue | 🟠 Major

Use fixed-width length + validate before allocation.

The file format is defined as uint32 length-prefixed records, but writes/reads still use int-based flow and no guard before palloc(reclen + 1). Corrupted lengths can trigger unsafe allocations.

Proposed fix

-	int			reclen;
+	uint32		reclen;
...
-	if (write(fd, &buf.len, sizeof(buf.len)) != sizeof(buf.len) ||
-		write(fd, buf.data, buf.len) != buf.len)
+	if (buf.len < 0 || (Size) buf.len > (Size) PG_UINT32_MAX)
+	{
+		CloseTransientFile(fd);
+		LWLockRelease(SpockCtx->discard_file_lock);
+		ereport(WARNING,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("discard record too large to serialize")));
+		pfree(buf.data);
+		return false;
+	}
+	reclen = (uint32) buf.len;
+	if (write(fd, &reclen, sizeof(reclen)) != sizeof(reclen) ||
+		write(fd, buf.data, reclen) != (ssize_t) reclen)
 	{
...
-	while (read(fd, &reclen, sizeof(reclen)) == sizeof(reclen))
+	while (read(fd, &reclen, sizeof(reclen)) == sizeof(reclen))
 	{
...
-		rec = palloc(reclen + 1);
+		if (reclen == 0 || reclen > (uint32) (MaxAllocSize - 1))
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("invalid record length %u in discard file \"%s\"",
+							reclen, path)));
+			break;
+		}
+		rec = palloc((Size) reclen + 1);

Also applies to: 459-460, 494-501

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/spock_exception_handler.c` around lines 421 - 422, The code currently
uses an int-sized length and unsafe allocation (buf.len and palloc(reclen + 1)),
risking huge/negative allocations; change to a fixed-width uint32_t for the
on-disk length (use uint32_t reclen or buf.len), write/read using
sizeof(uint32_t) and validate reclen after converting/endian-handling, then
guard before allocation (ensure reclen > 0 and reclen <= a safe cap such as
MaxAllocSize-1 or a protocol max) before calling palloc(reclen + 1); also check
that write/read return values compare against sizeof(uint32_t) and reclen (use
ssize_t) and apply the same fixes to the other occurrences noted (the related
write/read/allocation sites around the other locations).

🧹 Nitpick comments (1)

src/spock_apply.c (1)
1348-1352: Remove unreachable exception-behaviour branch in DISCARD path.

This branch is inside the else case that already excludes TRANSDISCARD/SUB_DISABLE, so the conditional is dead code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1348 - 1352, The conditional checking
exception_behaviour against TRANSDISCARD/SUB_DISABLE inside the else branch is
dead code; remove the if/else and always call ReleaseCurrentSubTransaction() in
that branch (i.e., replace the conditional block with a single call to
ReleaseCurrentSubTransaction()), leaving
RollbackAndReleaseCurrentSubTransaction() only where it is legitimately
reachable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_exception_handler.c`:
- Around line 320-322: Update the function comment to reflect the actual on-disk
format and behavior: change the phrase "newline-delimited JSON" to indicate the
function writes binary length-prefixed records (i.e., a binary length header
followed by the JSON payload) and keep the note that it is safe to call outside
a transaction and does not access catalog tables; mention the exact
length-prefix semantics used by the implementation (e.g., 32-bit/64-bit
little-endian) and that each call writes one length-prefixed record so readers
must parse the length header to read the JSON payload.

---

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 1324-1326: The discardfile_write call's return value is ignored
and failed is set to false unconditionally; update the code around
discardfile_write (the call using local_node->node->name, rel, remote_origin_id,
"INSERT", NULL, &newtup, remote_xid) to check its return value, only set failed
= false when discardfile_write reports success, and set failed = true (and log
an error via existing logging facilities) when it fails so the operation isn't
marked successful when the discard record couldn't be written.

In `@src/spock_exception_handler.c`:
- Around line 421-422: The code currently uses an int-sized length and unsafe
allocation (buf.len and palloc(reclen + 1)), risking huge/negative allocations;
change to a fixed-width uint32_t for the on-disk length (use uint32_t reclen or
buf.len), write/read using sizeof(uint32_t) and validate reclen after
converting/endian-handling, then guard before allocation (ensure reclen > 0 and
reclen <= a safe cap such as MaxAllocSize-1 or a protocol max) before calling
palloc(reclen + 1); also check that write/read return values compare against
sizeof(uint32_t) and reclen (use ssize_t) and apply the same fixes to the other
occurrences noted (the related write/read/allocation sites around the other
locations).

---

Nitpick comments:
In `@src/spock_apply.c`:
- Around line 1348-1352: The conditional checking exception_behaviour against
TRANSDISCARD/SUB_DISABLE inside the else branch is dead code; remove the if/else
and always call ReleaseCurrentSubTransaction() in that branch (i.e., replace the
conditional block with a single call to ReleaseCurrentSubTransaction()), leaving
RollbackAndReleaseCurrentSubTransaction() only where it is legitimately
reachable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: df2fdda8-30f3-4ed5-8965-acc65c52f586

📥 Commits

Reviewing files that changed from the base of the PR and between 938f3bf and 28ab443.

⛔ Files ignored due to path filters (1)

tests/regress/expected/replication_set.out is excluded by !**/*.out

📒 Files selected for processing (7)

include/spock_exception_handler.h
include/spock_worker.h
sql/spock--6.0.0-devel.sql
src/spock_apply.c
src/spock_exception_handler.c
src/spock_shmem.c
tests/regress/sql/replication_set.sql

🚧 Files skipped from review as they are similar to previous changes (2)

sql/spock--6.0.0-devel.sql
src/spock_shmem.c

src/spock_exception_handler.c

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

src/spock_apply.c (1)
1311-1326: ⚠️ Potential issue | 🟠 Major

Capture the discard-file result here instead of forcing success.

This branch unconditionally clears failed after calling discardfile_write(). Downstream, the common logging block still calls log_insert_exception(), so a successful append is duplicated into spock.exception_log and a failed append is treated as handled. Track the write outcome, raise on false, and use that flag to skip the later per-row exception_log path when the discard file already has the row.
🛠️ Possible fix
+			if (!(wrote_discard_record =
+					discardfile_write(local_node->node->name, rel, remote_origin_id,
+									  "INSERT", NULL, &newtup, remote_xid)))
+				ereport(ERROR,
+						(errmsg("failed to write discard record for INSERT on %s.%s",
+								rel->nspname, rel->relname)));
-			discardfile_write(local_node->node->name, rel, remote_origin_id,
-							  "INSERT", NULL, &newtup, remote_xid);
 			failed = false;
Add bool wrote_discard_record = false; near the local declarations at the top of handle_insert(), then gate the later log_insert_exception() block on !wrote_discard_record.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1311 - 1326, In handle_insert(), don’t
unconditionally clear failed after calling discardfile_write(); capture its
boolean return (e.g., bool wrote_discard_record = discardfile_write(...)) and
use that value to: 1) treat a false return as an error (propagate/raise/fail
instead of silently setting failed = false) and 2) skip the later call to
log_insert_exception() when wrote_discard_record is true so you don’t duplicate
successful discard-file writes into spock.exception_log; update the control flow
around discardfile_write(), failed, and the log_insert_exception() call
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_exception_handler.c`:
- Around line 468-524: spock_discard_read currently reads records without
coordinating with discardfile_write which writes length and payload in separate
syscalls; fix by acquiring the same lock used by discardfile_write in shared
mode (e.g. call the discardfile write-lock API or the module's RWLock in
shared/read mode) before opening/reading the file, or alternatively acquire that
lock, call lseek(fd, 0, SEEK_END) to snapshot EOF while still holding the lock
and then lseek back to start and only read up to that snapshot EOF; ensure you
hold the lock during the loop that reads reclen/rec (the code around fd,
read(fd,&reclen,...), palloc/pfree and tuplestore_putvalues) and release the
lock after finishing to prevent seeing a length header without its payload.
- Around line 443-450: The SQL example in the comment for the SQL-callable
function spock.discard_read() uses an unaliased identifier "rec" which will
fail; update the example to alias the function result (e.g. use FROM
spock.discard_read() AS rec) or reference the actual output column name instead
so the SELECT references a valid identifier; edit the comment block above the
spock.discard_read() declaration in spock_exception_handler.c to show the
corrected example line (SELECT rec->>'node_name', rec->>'operation',
rec->'remote_tuple' FROM spock.discard_read() AS rec;).

In `@tests/regress/sql/replication_set.sql`:
- Around line 217-222: The assertions currently read the entire
spock.discard_read() output and can pick up prior TRANSDISCARD rows; before each
scenario capture a baseline (SELECT max((rec->>'id')::bigint) FROM
spock.discard_read()) and then restrict subsequent checks to rows with
(rec->>'id')::bigint > baseline, or alternatively add explicit filters on
rec->>'relname' and rec->>'operation' in the SELECT/WHERE to only validate the
rows created in that scenario (also add the missing UPDATE check by filtering
for operation = 'UPDATE' when verifying update discard output); use the
spock.discard_read() call and the rec->>'id', rec->>'relname', rec->>'operation'
fields to implement these scoped assertions.

---

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 1311-1326: In handle_insert(), don’t unconditionally clear failed
after calling discardfile_write(); capture its boolean return (e.g., bool
wrote_discard_record = discardfile_write(...)) and use that value to: 1) treat a
false return as an error (propagate/raise/fail instead of silently setting
failed = false) and 2) skip the later call to log_insert_exception() when
wrote_discard_record is true so you don’t duplicate successful discard-file
writes into spock.exception_log; update the control flow around
discardfile_write(), failed, and the log_insert_exception() call accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a1fe25dd-f6e0-44bf-854e-6220460e6988

📥 Commits

Reviewing files that changed from the base of the PR and between 28ab443 and 191b8f1.

⛔ Files ignored due to path filters (1)

tests/regress/expected/replication_set.out is excluded by !**/*.out

📒 Files selected for processing (7)

include/spock_exception_handler.h
include/spock_worker.h
sql/spock--6.0.0-devel.sql
src/spock_apply.c
src/spock_exception_handler.c
src/spock_shmem.c
tests/regress/sql/replication_set.sql

🚧 Files skipped from review as they are similar to previous changes (3)

sql/spock--6.0.0-devel.sql
src/spock_shmem.c
include/spock_worker.h

src/spock_exception_handler.c

tests/regress/sql/replication_set.sql

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/spock_apply.c (1)

1324-1326: ⚠️ Potential issue | 🟡 Minor

Handle discardfile_write() failure.

The return value is ignored despite the function returning a boolean success indicator. The TODO comment at lines 1322-1323 acknowledges this needs to be addressed. At minimum, consider propagating the failure to failed so the exception logging path captures the issue.
,

🛠️ Suggested approach

-			discardfile_write(local_node->node->name, rel, remote_origin_id,
-							  "INSERT", NULL, &newtup, remote_xid);
-			failed = false;
+			if (!discardfile_write(local_node->node->name, rel, remote_origin_id,
+								   "INSERT", NULL, &newtup, remote_xid))
+			{
+				ereport(WARNING,
+						(errmsg("failed to write discard record for INSERT on %s.%s",
+								rel->nspname, rel->relname)));
+				/* Continue anyway - the WARNING from discardfile_write provides details */
+			}
+			failed = false;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1324 - 1326, The call to
discardfile_write(...) currently ignores its boolean return value; capture that
return (e.g., bool ok = discardfile_write(local_node->node->name, rel,
remote_origin_id, "INSERT", NULL, &newtup, remote_xid)) and if it returns false
set failed = true so the existing exception/logging path picks up the error (and
consider propagating/returning the failure out of the surrounding function if
appropriate); update the code around the discardfile_write call and the failed
variable handling to reflect this change in spock_apply.c.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 1346-1353: The if-check inside the non-exception branch is dead
because this code path only runs when exception_behaviour is already known not
to be TRANSDISCARD or SUB_DISABLE; remove the unreachable conditional and the
RollbackAndReleaseCurrentSubTransaction() path and always call
ReleaseCurrentSubTransaction() in the block. Locate the block using the symbols
exception_behaviour, TRANSDISCARD, SUB_DISABLE,
RollbackAndReleaseCurrentSubTransaction, and ReleaseCurrentSubTransaction and
simplify it by deleting the inner if/else and leaving a single call to
ReleaseCurrentSubTransaction().

---

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 1324-1326: The call to discardfile_write(...) currently ignores
its boolean return value; capture that return (e.g., bool ok =
discardfile_write(local_node->node->name, rel, remote_origin_id, "INSERT", NULL,
&newtup, remote_xid)) and if it returns false set failed = true so the existing
exception/logging path picks up the error (and consider propagating/returning
the failure out of the surrounding function if appropriate); update the code
around the discardfile_write call and the failed variable handling to reflect
this change in spock_apply.c.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15b2baf5-2cff-4638-beb6-7d5e343cafa2

📥 Commits

Reviewing files that changed from the base of the PR and between 191b8f1 and 06ee6a3.

⛔ Files ignored due to path filters (1)

tests/regress/expected/replication_set.out is excluded by !**/*.out

📒 Files selected for processing (7)

include/spock_exception_handler.h
include/spock_worker.h
sql/spock--6.0.0-devel.sql
src/spock_apply.c
src/spock_exception_handler.c
src/spock_shmem.c
tests/regress/sql/replication_set.sql

🚧 Files skipped from review as they are similar to previous changes (4)

src/spock_shmem.c
include/spock_worker.h
sql/spock--6.0.0-devel.sql
tests/regress/sql/replication_set.sql

src/spock_apply.c

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/spock_apply.c (1)

1355-1360: ⚠️ Potential issue | 🟠 Major

Skip per-row exception_log writes for the DISCARDFILE insert path.

After the TRANSDISCARD/SUB_DISABLE branch above, this still calls log_insert_exception(). That keeps the old per-row exception_log behavior, so discarded INSERTs land in both stores instead of only discard_<dboid>.log.

💡 One way to gate the old logging path

-		{
+		if (!(exception_behaviour == TRANSDISCARD ||
+			  exception_behaviour == SUB_DISABLE))
+		{
 			char	   *error_msg = edata ? edata->message :
 				(exception_log_ptr[my_exception_log_index].initial_error_message[0] ?
 				 exception_log_ptr[my_exception_log_index].initial_error_message : NULL);

 			log_insert_exception(failed, error_msg, rel, NULL, &newtup, "INSERT");
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1355 - 1360, The code unconditionally calls
log_insert_exception(failed, error_msg, rel, NULL, &newtup, "INSERT") even for
the DISCARDFILE insert path, causing discarded INSERTs to be written to both the
discard_<dboid>.log and the per-row exception_log; change the call site in
src/spock_apply.c (after the TRANSDISCARD/SUB_DISABLE handling) to skip calling
log_insert_exception when the current insert is the DISCARDFILE/SUB_DISABLE path
(i.e., detect the same condition/flag used for TRANSDISCARD/SUB_DISABLE) so that
for that branch you do not invoke log_insert_exception with
failed/error_msg/rel/newtup/"INSERT".

♻️ Duplicate comments (1)

src/spock_apply.c (1)
1324-1326: ⚠️ Potential issue | 🟠 Major

Don't treat a failed discard-file append as success.

The return value is still ignored and failed is cleared unconditionally, so an open/write failure drops the row from replay and from the discard audit trail.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1324 - 1326, The call to
discardfile_write(...) is currently ignored and failed is unconditionally set to
false; change the logic in the block around discardfile_write (the call that
passes local_node->node->name, rel, remote_origin_id, "INSERT", NULL, &newtup,
remote_xid) to check discardfile_write's return value and only clear the local
variable failed when that call succeeds; if discardfile_write fails, preserve
failed=true (and add an appropriate processLogger/elog/error report) so the row
is not dropped from replay or the discard audit trail.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sql/spock--6.0.0-devel.sql`:
- Around line 101-102: The upgrade script is missing the spock.discard_read()
function declaration, so upgraded databases won't get the function defined in
the base install; add a CREATE OR REPLACE FUNCTION spock.discard_read() ...
RETURNS SETOF jsonb VOLATILE LANGUAGE c AS 'MODULE_PATHNAME',
'spock_discard_read'; statement to the upgrade migration (the 5.0.6→6.0.0-devel
script) so it matches the base install script and ensures the spock_discard_read
symbol is installed during upgrades.

In `@src/spock_exception_handler.c`:
- Around line 424-425: The two separate write() calls that write buf.len then
buf.data can leave a truncated length header on partial failure; replace them
with a single atomic scatter/gather write using writev (prepare an iovec for
&buf.len and buf.data) and loop on EINTR, then verify the return equals
sizeof(buf.len) + buf.len; if it doesn’t, treat it as a full write failure and
handle cleanup/error reporting to avoid leaving a dangling header (this change
touches the location where write(fd, &buf.len, ...) and write(fd, buf.data, ...)
are called in spock_exception_handler.c).

In `@src/spock_shmem.c`:
- Around line 172-175: The discard_id is being zeroed in spock_shmem.c which
causes ID reuse after restarts; instead of
pg_atomic_init_u64(&SpockCtx->discard_id, 0) reseed it at startup by scanning
the existing discard_<dboid>.log (or reading the last appended record) to
determine the highest-used id and initialize the atomic to that value + 1;
perform this scan/probe while holding SpockCtx->discard_file_lock (the lock
obtained from GetNamedLWLockTranche("spock discard file lock")) and set the
atomic using the appropriate pg_atomic write/fetch API so IDs remain unique
across postmaster restarts.

---

Outside diff comments:
In `@src/spock_apply.c`:
- Around line 1355-1360: The code unconditionally calls
log_insert_exception(failed, error_msg, rel, NULL, &newtup, "INSERT") even for
the DISCARDFILE insert path, causing discarded INSERTs to be written to both the
discard_<dboid>.log and the per-row exception_log; change the call site in
src/spock_apply.c (after the TRANSDISCARD/SUB_DISABLE handling) to skip calling
log_insert_exception when the current insert is the DISCARDFILE/SUB_DISABLE path
(i.e., detect the same condition/flag used for TRANSDISCARD/SUB_DISABLE) so that
for that branch you do not invoke log_insert_exception with
failed/error_msg/rel/newtup/"INSERT".

---

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 1324-1326: The call to discardfile_write(...) is currently ignored
and failed is unconditionally set to false; change the logic in the block around
discardfile_write (the call that passes local_node->node->name, rel,
remote_origin_id, "INSERT", NULL, &newtup, remote_xid) to check
discardfile_write's return value and only clear the local variable failed when
that call succeeds; if discardfile_write fails, preserve failed=true (and add an
appropriate processLogger/elog/error report) so the row is not dropped from
replay or the discard audit trail.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eb8afade-e97a-4074-a253-1b65ffab3e0a

📥 Commits

Reviewing files that changed from the base of the PR and between 06ee6a3 and 0843312.

⛔ Files ignored due to path filters (1)

tests/regress/expected/replication_set.out is excluded by !**/*.out

📒 Files selected for processing (7)

include/spock_exception_handler.h
include/spock_worker.h
sql/spock--6.0.0-devel.sql
src/spock_apply.c
src/spock_exception_handler.c
src/spock_shmem.c
tests/regress/sql/replication_set.sql

🚧 Files skipped from review as they are similar to previous changes (1)

tests/regress/sql/replication_set.sql

sql/spock--6.0.0-devel.sql

src/spock_exception_handler.c

src/spock_shmem.c

Preparation commit. In TRANSDISCARD mode, instead of executing each DML in a subtransaction during replay (and logging to exception_log), dump records directly to a per-database binary file under pg_spock/. This follows the Oracle GoldenGate approach: on error, skip re-execution and just record what was discarded. - Add discardfile_write() to serialize records as length-prefixed JSON to pg_spock/discard_<dboid>.log, using an atomic counter for IDs and a dedicated LWLock for concurrent append safety. - Add spock.discard_read() SQL function to read the file back as a table. - Wire TRANSDISCARD/SUB_DISABLE paths in handle_insert() to use the discardfile instead of subtransaction replay. Further commits should add this behaviour into other handle_* routines.

coderabbitai

🧹 Nitpick comments (2)

src/spock_exception_handler.c (2)
88-97: Clarify the distinction between DF_XID and DF_REMOTE_XID.

Both DF_XID (line 88) and DF_REMOTE_XID (line 97) are defined, and both are written in discardfile_write() at lines 359 and 383 with the same remote_xid value. This appears to be redundant—consider removing one or documenting why both are needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_exception_handler.c` around lines 88 - 97, DF_XID and DF_REMOTE_XID
are both defined and both currently written using the same "remote_xid" value in
discardfile_write(), causing redundancy; update the code so DF_XID represents
the local transaction id and DF_REMOTE_XID represents the remote transaction id
(or remove one if truly unnecessary). Locate the DF_XID and DF_REMOTE_XID macros
and the discardfile_write() calls (references: DF_XID, DF_REMOTE_XID,
discardfile_write) and either: 1) change the value passed for DF_XID to the
correct local XID variable and add a comment documenting the distinction, or 2)
if only one XID is needed, remove the unused macro and its corresponding field
write to eliminate duplication and update any callers and comments accordingly.
508-514: Consider adding bounds check on reclen before allocation.

Per previous discussion, this is deferred to a TODO, but for reference: reclen is a signed int read directly from the file. A corrupted or malformed file could provide a negative or excessively large value, leading to undefined behavior or huge allocation in palloc(reclen + 1). A simple bounds check (e.g., reclen > 0 && reclen < MaxAllocSize) before the allocation would harden this path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_exception_handler.c` around lines 508 - 514, The loop that reads
reclen and then calls palloc(reclen + 1) must validate reclen first to avoid
negative or huge allocations: in the function containing the while(read(...))
loop (the block that allocates rec from reclen), add a guard that checks reclen
> 0 && reclen < MaxAllocSize (or an appropriate upper bound constant) and handle
failures by logging/erroring and breaking out or returning an error instead of
calling palloc; reference the variables and calls reclen and palloc(reclen + 1)
in spock_exception_handler.c and ensure the error path frees/cleans up
consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/spock_exception_handler.c`:
- Around line 88-97: DF_XID and DF_REMOTE_XID are both defined and both
currently written using the same "remote_xid" value in discardfile_write(),
causing redundancy; update the code so DF_XID represents the local transaction
id and DF_REMOTE_XID represents the remote transaction id (or remove one if
truly unnecessary). Locate the DF_XID and DF_REMOTE_XID macros and the
discardfile_write() calls (references: DF_XID, DF_REMOTE_XID, discardfile_write)
and either: 1) change the value passed for DF_XID to the correct local XID
variable and add a comment documenting the distinction, or 2) if only one XID is
needed, remove the unused macro and its corresponding field write to eliminate
duplication and update any callers and comments accordingly.
- Around line 508-514: The loop that reads reclen and then calls palloc(reclen +
1) must validate reclen first to avoid negative or huge allocations: in the
function containing the while(read(...)) loop (the block that allocates rec from
reclen), add a guard that checks reclen > 0 && reclen < MaxAllocSize (or an
appropriate upper bound constant) and handle failures by logging/erroring and
breaking out or returning an error instead of calling palloc; reference the
variables and calls reclen and palloc(reclen + 1) in spock_exception_handler.c
and ensure the error path frees/cleans up consistently.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fdc6d8f4-a049-49ba-b007-0dae9c93fd8c

📥 Commits

Reviewing files that changed from the base of the PR and between 06ee6a3 and 4360ca6.

⛔ Files ignored due to path filters (1)

tests/regress/expected/replication_set.out is excluded by !**/*.out

📒 Files selected for processing (7)

include/spock_exception_handler.h
include/spock_worker.h
sql/spock--6.0.0-devel.sql
src/spock_apply.c
src/spock_exception_handler.c
src/spock_shmem.c
tests/regress/sql/replication_set.sql

🚧 Files skipped from review as they are similar to previous changes (3)

include/spock_worker.h
src/spock_shmem.c
include/spock_exception_handler.h

danolivo requested a review from mason-sharp March 17, 2026 13:36

danolivo self-assigned this Mar 17, 2026

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

sql/spock--6.0.0-devel.sql Outdated Show resolved Hide resolved

src/spock_apply.c Show resolved Hide resolved

src/spock_exception_handler.c Outdated Show resolved Hide resolved

src/spock_exception_handler.c Show resolved Hide resolved

danolivo force-pushed the spoc-481 branch from 938f3bf to 28ab443 Compare March 17, 2026 14:31

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

src/spock_exception_handler.c Show resolved Hide resolved

danolivo force-pushed the spoc-481 branch from 28ab443 to 191b8f1 Compare March 17, 2026 14:57

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

src/spock_exception_handler.c Outdated Show resolved Hide resolved

src/spock_exception_handler.c Show resolved Hide resolved

tests/regress/sql/replication_set.sql Show resolved Hide resolved

danolivo force-pushed the spoc-481 branch from 191b8f1 to 06ee6a3 Compare March 17, 2026 15:18

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

src/spock_apply.c Outdated Show resolved Hide resolved

danolivo force-pushed the spoc-481 branch from 06ee6a3 to 0843312 Compare March 17, 2026 15:32

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

sql/spock--6.0.0-devel.sql Show resolved Hide resolved

src/spock_exception_handler.c Show resolved Hide resolved

src/spock_shmem.c Show resolved Hide resolved

danolivo force-pushed the spoc-481 branch from 0843312 to 4360ca6 Compare March 18, 2026 14:01

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

danolivo added the enhancement New feature or request label Mar 18, 2026

Conversation

danolivo commented Mar 17, 2026

Summary

Design details

What's next

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 17, 2026 •

edited

Loading