Skip to content
/ server Public

MDEV-38968 Redundant FILE_CHECKPOINT writes#4747

Open
dr-m wants to merge 1 commit into10.11from
MDEV-38968
Open

MDEV-38968 Redundant FILE_CHECKPOINT writes#4747
dr-m wants to merge 1 commit into10.11from
MDEV-38968

Conversation

@dr-m
Copy link
Contributor

@dr-m dr-m commented Mar 6, 2026

  • The Jira issue number for this PR is: MDEV-38968

Description

Concurrent calls to log_checkpoint_low() were possible from multiple threads, and they could cause redundant writes of FILE_CHECKPOINT records. Let us simplify the logic by making the dedicated buf_flush_page_cleaner() thread responsible for checkpoints.

log_t::write_checkpoint(lsn_t end_lsn): Add the parameter checkpoint, which replaces log_sys.next_checkpoint_lsn.

log_sys.checkpoint_pending: Remove. Only the buf_flush_page_cleaner thread will write checkpoints, hence there is no possibility of a race condition.

log_checkpoint_low(), log_checkpoint(): Remove the return value, because there cannot be any concurrent log checkpoint in progress.

buf_flush_wait(): Add special handling for log_sys.check_for_checkpoint() as well as shutdown.

buf_flush_wait_flushed(): Assert that buf_flush_page_cleaner() is available.

log_make_checkpoint(): Delegate all work to the page cleaner.

buf_flush_sync_for_checkpoint(): Update the systemd watchdog. On shutdown, keep flushing until a checkpoint has been written.

buf_flush_page_cleaner(): Revise the shutdown logic so that all changes will be written out.

buf_flush_buffer_pool(): Remove.

buf_flush_wait_flushed(): Require the caller to acquire buf_pool.flush_list_mutex.

logs_empty_and_mark_files_at_shutdown(): Simplify the logic.

fil_names_clear(): Fix an off-by-one error that would prevent removal from fil_system.named_spaces.

Release Notes

InnoDB shutdown was simplified.

How can this PR be tested?

rm -fr mysql-test/var
mkdir /tmp/var
ln -s /tmp/var mysql-test/var
mysql-test/mtr --repeat=10 --parallel=auto innodb.log_corruption_recovery{,,,,,}{,,,}
mysql-test/mtr --force --parallel=auto --big-test

This ensures that despite cmake -DWITH_INNODB_PMEM=ON the pwrite based log write code path will be used.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@dr-m dr-m self-assigned this Mar 6, 2026
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@dr-m dr-m marked this pull request as draft March 7, 2026 09:49
@dr-m
Copy link
Contributor Author

dr-m commented Mar 7, 2026

Apparently there are some problems with shutdown that make the tests innodb.table_flags and innodb_zip.restart fail. On i686, many --suite=mariabackup tests are failing.

@dr-m dr-m force-pushed the MDEV-38968 branch 3 times, most recently from 946b8b5 to 338a371 Compare March 9, 2026 09:44

--echo # case 1: Abort after resetting TRX_SYS page rollback segments
let $restart_parameters=--innodb_undo_tablespaces=4 --debug_dbug="+d,after_rseg_reset_abort";
let $restart_parameters=--innodb_undo_tablespaces=4 --debug_dbug="+d,after_rseg_reset_abort,ib_log_checkpoint_avoid_hard";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an attempt to avoid a test failure on Windows. It did not work out:

CURRENT_TEST: innodb.undo_upgrade_debug
--- C:/buildbot/workers/prod/amd64-windows/build/mysql-test/suite/innodb/r/undo_upgrade_debug.result	2026-03-09 10:49:32.801270600 +0100
+++ C:\buildbot\workers\prod\amd64-windows\build\mysql-test\suite\innodb\r\undo_upgrade_debug.reject	2026-03-09 10:52:37.306331700 +0100
@@ -18,26 +18,15 @@
 # restart: --innodb_undo_tablespaces=2 --debug_dbug=+d,after_deleting_old_undo_abort,ib_log_checkpoint_avoid_hard
 # restart: --innodb_undo_tablespaces=2
 # Should list 2 undo log tablespaces
-undo001
-undo002
 # case 3: Abort after successfully deleting the old undo tablespace
 # restart: --innodb_undo_tablespaces=3 --debug_dbug=+d,after_deleting_old_undo_success
 # restart: --innodb_undo_tablespaces=3
 # Should list 3 undo log tablespaces
-undo001
-undo002
-undo003
 # case 4: Abort after re-creating new undo tablespaces
 # restart: --innodb_undo_tablespaces=4 --debug_dbug=+d,after_reinit_undo_abort
 # restart: --innodb_undo_tablespaces=4
 # Should list 4 undo log tablespaces
-undo001
-undo002
-undo003
-undo004
 # case 5: Abort after re-creating new undo tablespaces successfully
 # restart: --innodb_undo_tablespaces=2 --debug_dbug=+d,after_reinit_undo_success
 # restart: --innodb_undo_tablespaces=2
 # Should list 2 undo log tablespaces
-undo001
-undo002
Result length mismatch

The following code is related to this:

/** Delete the old undo tablespaces present in the undo log directory */
static dberr_t srv_undo_delete_old_tablespaces()
{
  /* Delete the old undo tablespaces*/
  for (uint32_t i= 0; i < srv_undo_tablespaces_open; ++i)
    fil_close_tablespace(srv_undo_space_id_start + i);

  DBUG_EXECUTE_IF("after_deleting_old_undo_abort", return DB_ERROR;);

  /* Do checkpoint to get rid of old undo log tablespaces redo logs */
  log_make_checkpoint();

  DBUG_EXECUTE_IF("after_deleting_old_undo_success", return DB_ERROR;);

  return DB_SUCCESS;
}

On buildbot, this failure only seems to be reproducible on Microsoft Windows. I can reproduce it on Linux by extracting the files from this var.tar.gz and it looks like crash recovery really is broken when there is no checkpoint after deleting the files:

sql/mariadbd --innodb-log-file-size=10m --innodb-page-size=8k --datadir=/tmp/mysql-test/var/log/innodb.undo_upgrade_debug-8k/mysqld.1/data
2026-03-09 12:12:28 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
2026-03-09 12:12:28 0 [Note] InnoDB: End of log at LSN=117100
2026-03-09 12:12:28 0 [ERROR] InnoDB: Failed to open the undo tablespace undo008
2026-03-09 12:12:28 0 [Note] InnoDB: Retry with innodb_force_recovery=5
2026-03-09 12:12:28 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1484] with error Generic error
2026-03-09 12:12:28 0 [Note] InnoDB: Starting shutdown...

I get the same result for every innodb_page_size. Only the name of the reported missing undo tablespace is occasionally varying. @Thirunarayanan, it seems to me that this is a genuine problem that used to be hidden for some reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking aloud, the problem ought to be solved by ensuring that another checkpoint will be completed before the fil_close_tablespace() loop. Possibly, we will need another log_make_checkpoint() call. I will check where the checkpoints are taking place in my environment, where this failure is not reproducible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my system, I observe that checkpoints are taking place due to calls to log_make_checkpoint(). The problem appears to be that on Windows, log_make_checkpoint() occasionally fails to trigger and wait for a checkpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, in 26fcc4b I solved this by ensuring that log_make_checkpoint() is called at each critical step of srv_undo_tablespaces_reinit(). Changes of innodb_undo_tablespaces were never fully crash-safe.

@dr-m dr-m force-pushed the MDEV-38968 branch 5 times, most recently from 9b9ae8a to 26fcc4b Compare March 9, 2026 14:59
@dr-m dr-m marked this pull request as ready for review March 9, 2026 15:18
@dr-m dr-m requested a review from Thirunarayanan March 9, 2026 15:18
Concurrent calls to log_checkpoint_low() were possible from multiple
threads, and they could cause redundant writes of FILE_CHECKPOINT
records. Let us simplify the logic by making the dedicated
buf_flush_page_cleaner() thread responsible for checkpoints.

log_t::write_checkpoint(lsn_t end_lsn): Add the parameter checkpoint,
which replaces log_sys.next_checkpoint_lsn.

log_sys.checkpoint_pending: Remove. Only the buf_flush_page_cleaner
thread will write checkpoints, hence there is no possibility of a
race condition.

log_checkpoint_low(), log_checkpoint(): Remove the return value,
because there cannot be any concurrent log checkpoint in progress.

buf_flush_wait(): Add a parameter for waiting for a full checkpoint.
This function replaces buf_flush_wait_flushed().

log_t::checkpoint_margin(): Replaces log_checkpoint_margin().

log_make_checkpoint(): Delegate all work to buf_flush_page_cleaner().

buf_flush_sync_batch(): Add the parameter bool checkpoint, to wait
for an empty FILE_CHECKPOINT record to be written. Outside recovery,
pass lsn=0.

buf_flush_sync_for_checkpoint(): On shutdown, update the systemd
watchdog and keep flushing until a final checkpoint has been written.

buf_flush_page_cleaner(): Revise the shutdown logic so that all
changes will be written out and a checkpoint with just a FILE_CHECKPOINT
record can be written.

buf_flush_buffer_pool(): Remove.

buf_flush_wait_flushed(): Require the caller to acquire
buf_pool.flush_list_mutex.

logs_empty_and_mark_files_at_shutdown(): Simplify the logic,
and return the shutdown LSN.

fil_names_clear(): Fix an off-by-one error that would prevent
removal from fil_system.named_spaces.

innodb_shutdown(): Always invoke logs_empty_and_mark_files_at_shutdown().

srv_undo_tablespaces_reinit(): Simplify the logic and remove the
fault injection point after_reinit_undo_abort, which would cause
a failure on Microsoft Windows. Changing innodb_undo_tablespaces is
not fully crash-safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants