Skip to content

[Bug]: I3C controller IT mode hangs: HAL ISRs don't drain Status FIFO #33

@pbwarmu017

Description

@pbwarmu017

Bug Summary

I3C controller HAL DAA & TX ISRs do not drain Status FIFO — peripheral hangs after every frame

Detailed Description

Environment

  • MCU: STM32H563ZIT6
  • Board: NUCLEO-H563ZI (MB1404 Rev C-01) ×2 (one controller, one target)
  • Firmware pack: STM32CubeH5 FW.H5.1.6.0
  • HAL file: Drivers/STM32H5xx_HAL_Driver/Src/stm32h5xx_hal_i3c.c (Copyright 2023)
  • Toolchain: STM32CubeIDE (arm-none-eabi-gcc, -O0 -g3)
  • Reproducer: minimal two-board ENTDAA + private write between two NUCLEO-H563ZI

Summary

HAL_I3C_Ctrl_DynAddrAssign_IT() and HAL_I3C_Ctrl_Transmit_IT() (controller-side, IT mode) cause the I3C peripheral to hang after the first wire-level frame. The completion callbacks (HAL_I3C_CtrlDAACpltCallback, HAL_I3C_CtrlTxCpltCallback) never fire, even though the wire-level protocol completes successfully (the target reports EVENT_ID_DAU and receives data correctly).

Root cause: the H5 I3C peripheral writes a per-frame status entry to the Status FIFO (SR) and pauses indefinitely until the application reads it. The HAL ISRs I3C_Ctrl_DAA_ISR and I3C_Ctrl_TX_ISR read the RX FIFO but never read SR, so the peripheral is permanently blocked.

Reproduction steps

  1. Configure two NUCLEO-H563ZI boards, both with I3C1 on PB8/PB9 (GPIO_AF3_I3C1).
  2. Apply ES0565 §2.15.3 PB8/PB9 internal pull-up workaround on both boards (separately reported / acknowledged in errata).
  3. Add 4.7 kΩ external pull-ups on SDA, SCL to 3.3 V. Connect controller PB8↔target PB8, PB9↔PB9, GND↔GND.
  4. Configure controller bus characteristics: SCLODLowDuration = 0xFF, SCLI2CHighDuration = 0xFF (slow open-drain to give the bench wiring rise-time margin — does not affect the bug, just makes the bus reliable).
  5. Target firmware: HAL_I3C_ActivateNotification(&hi3c1, NULL, HAL_I3C_IT_DAUPDIE); at boot, no HAL_I3C_Tgt_Receive_IT until after EVENT_ID_DAU arrives.
  6. Controller firmware: call HAL_I3C_Ctrl_DynAddrAssign_IT(&hi3c1, I3C_ONLY_ENTDAA); then poll for i3c_daa_done (set by HAL_I3C_CtrlDAACpltCallback).

Expected behavior

HAL_I3C_CtrlDAACpltCallback fires within ~1 ms of submit. i3c_daa_done flips to 1.

Actual behavior

i3c_daa_done stays 0 forever (tested with 12 s wait). Diagnostic register reads after the wait show:

state = HAL_I3C_STATE_BUSY_DAA (0x24)
EVR   = 0x0000000B   (CFEF | TXFEF | SFNEF — Status FIFO has 1 unread entry)
EVR.FCF = 0          (no Frame Complete event)
SER   = 0            (no errors)
IER   = 0x00000A14   (FCIE/CFNFIE/TXFNFIE/ERRIE all correctly enabled)

Callback fire counters confirm:

  • HAL_I3C_TgtReqDynamicAddrCallback fires at +0 ms (target's PID arrives, our SetDynAddr runs, target receives DA).
  • HAL_I3C_CtrlDAACpltCallback never fires.
  • HAL_I3C_ErrorCallback never fires.

The target side reports EVENT_ID_DAU and successfully receives its dynamic address, confirming the wire-level protocol completes through DA assignment.

Root cause

The H5 I3C peripheral writes one entry to the Status FIFO after the ENTDAA frame containing PID/BCR/DCR is processed. Per the H5 reference manual §49 the peripheral is designed to pause subsequent protocol steps until the application drains the Status FIFO — this is the SFNEF flag in I3C_EVR.

I3C_Ctrl_DAA_ISR (line ~8201 of stm32h5xx_hal_i3c.c) reads from the RX FIFO via LL_I3C_ReceiveData8 to extract PID, but never reads I3C1->SR. The Status FIFO entry sits unread → peripheral does not generate the trailing Sr+0x7E+R round → no NACK → no STOP → FCF never set → HAL_I3C_CtrlDAACpltCallback never invoked.

Same defect in I3C_Ctrl_TX_ISR after a private write: HAL_I3C_CtrlTxCpltCallback never fires.

Verification of root cause

Adding the following drain to the wait loop in application code resolves the issue completely:

while (i3c_daa_done == 0 && (HAL_GetTick() - t0) < 1000) {
    while ((I3C1->EVR & I3C_EVR_SFNEF) != 0) {
        (void)I3C1->SR;
    }
}

With the drain, HAL_I3C_CtrlDAACpltCallback fires within ~1 ms of submit (verified via tick timestamp captured inside the callback). The same drain pattern resolves the TX-complete callback after HAL_I3C_Ctrl_Transmit_IT. Bench-verified Apr 25 2026 with the two-NUCLEO setup; controller and target both report success and the slave receives the expected payload bytes.

Diagnostic progression captured during root-cause analysis:

Test condition DAA-cplt callback fires at i3c_daa_done snap
5 s wait, no SR drain never (timed out) 0
10 s wait + 200 ms settle, no drain, __WFI in loop +10200 ms (right at wait expiry — coincidence with diagnostic SR drain after the wait) 0
12 s polling loop (1 s steps), no SR drain +12096 ms (right at loop exit, same coincidence) 0
12 s polling loop with SR drain inside each iteration +1001 ms (right after first drain) 1
1 s tight wait with SR drain +0 ms 1

The progression is monotonic in "amount of SR drain": more drain → faster completion. With drain in the tight wait loop, the cycle completes in well under 2 ms (limit of HAL_GetTick resolution).

Suggested upstream fix

Add a Status FIFO drain inside the controller-side ISRs in stm32h5xx_hal_i3c.c:

/* In I3C_Ctrl_DAA_ISR, after the existing TXFNFF/RXFNEF handling: */
while (LL_I3C_IsActiveFlag_SFNE(hi3c->Instance) != 0U)
{
    (void)LL_I3C_GetRxStatus(hi3c->Instance);  /* or equivalent SR read */
}

Same drain belongs in I3C_Ctrl_TX_ISR and likely the RX/CCC ISRs as well — wherever the controller's status FIFO can accumulate per-frame entries.

Alternatively, enabling the SFNEIE interrupt for these IT modes and reading SR inside the ISR on each SFNE event would also work and be more event-driven.

Impact

Any H5 I3C controller application using IT mode will hang on the first ENTDAA broadcast or first private transfer unless the application happens to read SR for unrelated reasons. The DMA mode path in HAL_I3C_Ctrl_DynAddrAssign_DMA may be similarly affected — not tested. The bug is silent: no error code is returned, the wire-level protocol completes correctly, only the HAL state machine never advances. This is exactly the failure mode that's hardest to diagnose because the obvious checks (verify ENTDAA returns OK, verify slave gets DA) all pass.

Workaround currently in use

Application-side SR drain in the wait loops (shown above). Works reliably; downside is that every consumer of the controller-side IT API must implement the drain themselves, which is non-obvious and not documented in the HAL header.

Reference

Bench config and full reproducer is documented in §S7.1 of the Embedded Mastery Series — Volume 1 (NUCLEO-H563ZI) lab manual; the relevant portions of the reproducer source live in two STM32CubeIDE projects (NUCLEO_H563ZI_Labs controller, NUCLEO_H563ZI_I3C_Target target) — willing to share if requested.

Expected Behavior

No response

Actual Behavior

No response

Environment

No response

Severity

None

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghalHAL-LL driver-related issue or pull-request.i3c

Type

Projects

Status

To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions