MDEV-38968 Redundant FILE_CHECKPOINT writes by dr-m · Pull Request #4747 · MariaDB/server

dr-m · 2026-03-06T15:36:19Z

The Jira issue number for this PR is: MDEV-38968

Description

Concurrent calls to log_checkpoint_low() were possible from multiple threads, and they could cause redundant writes of FILE_CHECKPOINT records. Let us simplify the logic by making the dedicated buf_flush_page_cleaner() thread responsible for checkpoints.

log_t::write_checkpoint(lsn_t end_lsn): Add the parameter checkpoint, which replaces log_sys.next_checkpoint_lsn.

log_sys.checkpoint_pending: Remove. Only the buf_flush_page_cleaner thread will write checkpoints, hence there is no possibility of a race condition.

log_checkpoint_low(), log_checkpoint(): Remove the return value, because there cannot be any concurrent log checkpoint in progress.

buf_flush_wait(): Add special handling for log_sys.check_for_checkpoint() as well as shutdown.

buf_flush_wait_flushed(): Assert that buf_flush_page_cleaner() is available.

log_make_checkpoint(): Delegate all work to the page cleaner.

buf_flush_sync_for_checkpoint(): Update the systemd watchdog. On shutdown, keep flushing until a checkpoint has been written.

buf_flush_page_cleaner(): Revise the shutdown logic so that all changes will be written out.

buf_flush_buffer_pool(): Remove.

buf_flush_wait_flushed(): Require the caller to acquire buf_pool.flush_list_mutex.

logs_empty_and_mark_files_at_shutdown(): Simplify the logic.

fil_names_clear(): Fix an off-by-one error that would prevent removal from fil_system.named_spaces.

Release Notes

InnoDB shutdown was simplified.

How can this PR be tested?

rm -fr mysql-test/var
mkdir /tmp/var
ln -s /tmp/var mysql-test/var
mysql-test/mtr --repeat=10 --parallel=auto innodb.log_corruption_recovery{,,,,,}{,,,}
mysql-test/mtr --force --parallel=auto --big-test

This ensures that despite cmake -DWITH_INNODB_PMEM=ON the pwrite based log write code path will be used.

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

CLAassistant · 2026-03-06T15:37:30Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

dr-m · 2026-03-07T09:51:46Z

Apparently there are some problems with shutdown that make the tests innodb.table_flags and innodb_zip.restart fail. On i686, many --suite=mariabackup tests are failing.

dr-m · 2026-03-09T10:16:25Z

mysql-test/suite/innodb/t/undo_upgrade_debug.test


 --echo # case 1: Abort after resetting TRX_SYS page rollback segments
-let $restart_parameters=--innodb_undo_tablespaces=4 --debug_dbug="+d,after_rseg_reset_abort";
+let $restart_parameters=--innodb_undo_tablespaces=4 --debug_dbug="+d,after_rseg_reset_abort,ib_log_checkpoint_avoid_hard";


This was an attempt to avoid a test failure on Windows. It did not work out:

CURRENT_TEST: innodb.undo_upgrade_debug --- C:/buildbot/workers/prod/amd64-windows/build/mysql-test/suite/innodb/r/undo_upgrade_debug.result 2026-03-09 10:49:32.801270600 +0100 +++ C:\buildbot\workers\prod\amd64-windows\build\mysql-test\suite\innodb\r\undo_upgrade_debug.reject 2026-03-09 10:52:37.306331700 +0100 @@ -18,26 +18,15 @@ # restart: --innodb_undo_tablespaces=2 --debug_dbug=+d,after_deleting_old_undo_abort,ib_log_checkpoint_avoid_hard # restart: --innodb_undo_tablespaces=2 # Should list 2 undo log tablespaces -undo001 -undo002 # case 3: Abort after successfully deleting the old undo tablespace # restart: --innodb_undo_tablespaces=3 --debug_dbug=+d,after_deleting_old_undo_success # restart: --innodb_undo_tablespaces=3 # Should list 3 undo log tablespaces -undo001 -undo002 -undo003 # case 4: Abort after re-creating new undo tablespaces # restart: --innodb_undo_tablespaces=4 --debug_dbug=+d,after_reinit_undo_abort # restart: --innodb_undo_tablespaces=4 # Should list 4 undo log tablespaces -undo001 -undo002 -undo003 -undo004 # case 5: Abort after re-creating new undo tablespaces successfully # restart: --innodb_undo_tablespaces=2 --debug_dbug=+d,after_reinit_undo_success # restart: --innodb_undo_tablespaces=2 # Should list 2 undo log tablespaces -undo001 -undo002 Result length mismatch

The following code is related to this:

/** Delete the old undo tablespaces present in the undo log directory */ static dberr_t srv_undo_delete_old_tablespaces() { /* Delete the old undo tablespaces*/ for (uint32_t i= 0; i < srv_undo_tablespaces_open; ++i) fil_close_tablespace(srv_undo_space_id_start + i); DBUG_EXECUTE_IF("after_deleting_old_undo_abort", return DB_ERROR;); /* Do checkpoint to get rid of old undo log tablespaces redo logs */ log_make_checkpoint(); DBUG_EXECUTE_IF("after_deleting_old_undo_success", return DB_ERROR;); return DB_SUCCESS; }

On buildbot, this failure only seems to be reproducible on Microsoft Windows. I can reproduce it on Linux by extracting the files from this var.tar.gz and it looks like crash recovery really is broken when there is no checkpoint after deleting the files:

sql/mariadbd --innodb-log-file-size=10m --innodb-page-size=8k --datadir=/tmp/mysql-test/var/log/innodb.undo_upgrade_debug-8k/mysqld.1/data

2026-03-09 12:12:28 0 [Note] InnoDB: Buffered log writes (block size=512 bytes) 2026-03-09 12:12:28 0 [Note] InnoDB: End of log at LSN=117100 2026-03-09 12:12:28 0 [ERROR] InnoDB: Failed to open the undo tablespace undo008 2026-03-09 12:12:28 0 [Note] InnoDB: Retry with innodb_force_recovery=5 2026-03-09 12:12:28 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1484] with error Generic error 2026-03-09 12:12:28 0 [Note] InnoDB: Starting shutdown...

I get the same result for every innodb_page_size. Only the name of the reported missing undo tablespace is occasionally varying. @Thirunarayanan, it seems to me that this is a genuine problem that used to be hidden for some reason.

Thinking aloud, the problem ought to be solved by ensuring that another checkpoint will be completed before the fil_close_tablespace() loop. Possibly, we will need another log_make_checkpoint() call. I will check where the checkpoints are taking place in my environment, where this failure is not reproducible.

On my system, I observe that checkpoints are taking place due to calls to log_make_checkpoint(). The problem appears to be that on Windows, log_make_checkpoint() occasionally fails to trigger and wait for a checkpoint.

In the end, in 26fcc4b I solved this by ensuring that log_make_checkpoint() is called at each critical step of srv_undo_tablespaces_reinit(). Changes of innodb_undo_tablespaces were never fully crash-safe.

Concurrent calls to log_checkpoint_low() were possible from multiple threads, and they could cause redundant writes of FILE_CHECKPOINT records. Let us simplify the logic by making the dedicated buf_flush_page_cleaner() thread responsible for checkpoints. log_t::write_checkpoint(lsn_t end_lsn): Add the parameter checkpoint, which replaces log_sys.next_checkpoint_lsn. log_sys.checkpoint_pending: Remove. Only the buf_flush_page_cleaner thread will write checkpoints, hence there is no possibility of a race condition. log_checkpoint_low(), log_checkpoint(): Remove the return value, because there cannot be any concurrent log checkpoint in progress. buf_flush_wait(): Add a parameter for waiting for a full checkpoint. This function replaces buf_flush_wait_flushed(). log_t::checkpoint_margin(): Replaces log_checkpoint_margin(). log_make_checkpoint(): Delegate all work to buf_flush_page_cleaner(). buf_flush_sync_batch(): Add the parameter bool checkpoint, to wait for an empty FILE_CHECKPOINT record to be written. Outside recovery, pass lsn=0. buf_flush_sync_for_checkpoint(): On shutdown, update the systemd watchdog and keep flushing until a final checkpoint has been written. buf_flush_page_cleaner(): Revise the shutdown logic so that all changes will be written out and a checkpoint with just a FILE_CHECKPOINT record can be written. buf_flush_buffer_pool(): Remove. buf_flush_wait_flushed(): Require the caller to acquire buf_pool.flush_list_mutex. logs_empty_and_mark_files_at_shutdown(): Simplify the logic, and return the shutdown LSN. fil_names_clear(): Fix an off-by-one error that would prevent removal from fil_system.named_spaces. innodb_shutdown(): Always invoke logs_empty_and_mark_files_at_shutdown(). srv_undo_tablespaces_reinit(): Simplify the logic and remove the fault injection point after_reinit_undo_abort, which would cause a failure on Microsoft Windows. Changing innodb_undo_tablespaces is not fully crash-safe.

dr-m self-assigned this Mar 6, 2026

dr-m added the MariaDB Corporation label Mar 6, 2026

dr-m marked this pull request as draft March 7, 2026 09:49

dr-m force-pushed the MDEV-38968 branch 3 times, most recently from 946b8b5 to 338a371 Compare March 9, 2026 09:44

dr-m commented Mar 9, 2026

View reviewed changes

dr-m force-pushed the MDEV-38968 branch 5 times, most recently from 9b9ae8a to 26fcc4b Compare March 9, 2026 14:59

dr-m marked this pull request as ready for review March 9, 2026 15:18

dr-m requested a review from Thirunarayanan March 9, 2026 15:18

dr-m force-pushed the MDEV-38968 branch from 26fcc4b to 2919ebf Compare March 9, 2026 19:50

dr-m force-pushed the MDEV-38968 branch from 2919ebf to 4d3c460 Compare March 9, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MDEV-38968 Redundant FILE_CHECKPOINT writes#4747

MDEV-38968 Redundant FILE_CHECKPOINT writes#4747
dr-m wants to merge 1 commit into10.11from
MDEV-38968

dr-m commented Mar 6, 2026

Uh oh!

CLAassistant commented Mar 6, 2026

Uh oh!

dr-m commented Mar 7, 2026

Uh oh!

dr-m Mar 9, 2026

Uh oh!

dr-m Mar 9, 2026

Uh oh!

dr-m Mar 9, 2026

Uh oh!

dr-m Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dr-m commented Mar 6, 2026

Description

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Uh oh!

CLAassistant commented Mar 6, 2026

Uh oh!

dr-m commented Mar 7, 2026

Uh oh!

dr-m Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

dr-m Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

dr-m Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

dr-m Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants