Bug Summary
I3C controller HAL DAA & TX ISRs do not drain Status FIFO — peripheral hangs after every frame
Detailed Description
Environment
- MCU: STM32H563ZIT6
- Board: NUCLEO-H563ZI (MB1404 Rev C-01) ×2 (one controller, one target)
- Firmware pack: STM32CubeH5 FW.H5.1.6.0
- HAL file:
Drivers/STM32H5xx_HAL_Driver/Src/stm32h5xx_hal_i3c.c (Copyright 2023)
- Toolchain: STM32CubeIDE (arm-none-eabi-gcc, -O0 -g3)
- Reproducer: minimal two-board ENTDAA + private write between two NUCLEO-H563ZI
Summary
HAL_I3C_Ctrl_DynAddrAssign_IT() and HAL_I3C_Ctrl_Transmit_IT() (controller-side, IT mode) cause the I3C peripheral to hang after the first wire-level frame. The completion callbacks (HAL_I3C_CtrlDAACpltCallback, HAL_I3C_CtrlTxCpltCallback) never fire, even though the wire-level protocol completes successfully (the target reports EVENT_ID_DAU and receives data correctly).
Root cause: the H5 I3C peripheral writes a per-frame status entry to the Status FIFO (SR) and pauses indefinitely until the application reads it. The HAL ISRs I3C_Ctrl_DAA_ISR and I3C_Ctrl_TX_ISR read the RX FIFO but never read SR, so the peripheral is permanently blocked.
Reproduction steps
- Configure two NUCLEO-H563ZI boards, both with
I3C1 on PB8/PB9 (GPIO_AF3_I3C1).
- Apply ES0565 §2.15.3 PB8/PB9 internal pull-up workaround on both boards (separately reported / acknowledged in errata).
- Add 4.7 kΩ external pull-ups on SDA, SCL to 3.3 V. Connect controller PB8↔target PB8, PB9↔PB9, GND↔GND.
- Configure controller bus characteristics:
SCLODLowDuration = 0xFF, SCLI2CHighDuration = 0xFF (slow open-drain to give the bench wiring rise-time margin — does not affect the bug, just makes the bus reliable).
- Target firmware:
HAL_I3C_ActivateNotification(&hi3c1, NULL, HAL_I3C_IT_DAUPDIE); at boot, no HAL_I3C_Tgt_Receive_IT until after EVENT_ID_DAU arrives.
- Controller firmware: call
HAL_I3C_Ctrl_DynAddrAssign_IT(&hi3c1, I3C_ONLY_ENTDAA); then poll for i3c_daa_done (set by HAL_I3C_CtrlDAACpltCallback).
Expected behavior
HAL_I3C_CtrlDAACpltCallback fires within ~1 ms of submit. i3c_daa_done flips to 1.
Actual behavior
i3c_daa_done stays 0 forever (tested with 12 s wait). Diagnostic register reads after the wait show:
state = HAL_I3C_STATE_BUSY_DAA (0x24)
EVR = 0x0000000B (CFEF | TXFEF | SFNEF — Status FIFO has 1 unread entry)
EVR.FCF = 0 (no Frame Complete event)
SER = 0 (no errors)
IER = 0x00000A14 (FCIE/CFNFIE/TXFNFIE/ERRIE all correctly enabled)
Callback fire counters confirm:
HAL_I3C_TgtReqDynamicAddrCallback fires at +0 ms (target's PID arrives, our SetDynAddr runs, target receives DA).
HAL_I3C_CtrlDAACpltCallback never fires.
HAL_I3C_ErrorCallback never fires.
The target side reports EVENT_ID_DAU and successfully receives its dynamic address, confirming the wire-level protocol completes through DA assignment.
Root cause
The H5 I3C peripheral writes one entry to the Status FIFO after the ENTDAA frame containing PID/BCR/DCR is processed. Per the H5 reference manual §49 the peripheral is designed to pause subsequent protocol steps until the application drains the Status FIFO — this is the SFNEF flag in I3C_EVR.
I3C_Ctrl_DAA_ISR (line ~8201 of stm32h5xx_hal_i3c.c) reads from the RX FIFO via LL_I3C_ReceiveData8 to extract PID, but never reads I3C1->SR. The Status FIFO entry sits unread → peripheral does not generate the trailing Sr+0x7E+R round → no NACK → no STOP → FCF never set → HAL_I3C_CtrlDAACpltCallback never invoked.
Same defect in I3C_Ctrl_TX_ISR after a private write: HAL_I3C_CtrlTxCpltCallback never fires.
Verification of root cause
Adding the following drain to the wait loop in application code resolves the issue completely:
while (i3c_daa_done == 0 && (HAL_GetTick() - t0) < 1000) {
while ((I3C1->EVR & I3C_EVR_SFNEF) != 0) {
(void)I3C1->SR;
}
}
With the drain, HAL_I3C_CtrlDAACpltCallback fires within ~1 ms of submit (verified via tick timestamp captured inside the callback). The same drain pattern resolves the TX-complete callback after HAL_I3C_Ctrl_Transmit_IT. Bench-verified Apr 25 2026 with the two-NUCLEO setup; controller and target both report success and the slave receives the expected payload bytes.
Diagnostic progression captured during root-cause analysis:
| Test condition |
DAA-cplt callback fires at |
i3c_daa_done snap |
| 5 s wait, no SR drain |
never (timed out) |
0 |
10 s wait + 200 ms settle, no drain, __WFI in loop |
+10200 ms (right at wait expiry — coincidence with diagnostic SR drain after the wait) |
0 |
| 12 s polling loop (1 s steps), no SR drain |
+12096 ms (right at loop exit, same coincidence) |
0 |
| 12 s polling loop with SR drain inside each iteration |
+1001 ms (right after first drain) |
1 |
| 1 s tight wait with SR drain |
+0 ms |
1 |
The progression is monotonic in "amount of SR drain": more drain → faster completion. With drain in the tight wait loop, the cycle completes in well under 2 ms (limit of HAL_GetTick resolution).
Suggested upstream fix
Add a Status FIFO drain inside the controller-side ISRs in stm32h5xx_hal_i3c.c:
/* In I3C_Ctrl_DAA_ISR, after the existing TXFNFF/RXFNEF handling: */
while (LL_I3C_IsActiveFlag_SFNE(hi3c->Instance) != 0U)
{
(void)LL_I3C_GetRxStatus(hi3c->Instance); /* or equivalent SR read */
}
Same drain belongs in I3C_Ctrl_TX_ISR and likely the RX/CCC ISRs as well — wherever the controller's status FIFO can accumulate per-frame entries.
Alternatively, enabling the SFNEIE interrupt for these IT modes and reading SR inside the ISR on each SFNE event would also work and be more event-driven.
Impact
Any H5 I3C controller application using IT mode will hang on the first ENTDAA broadcast or first private transfer unless the application happens to read SR for unrelated reasons. The DMA mode path in HAL_I3C_Ctrl_DynAddrAssign_DMA may be similarly affected — not tested. The bug is silent: no error code is returned, the wire-level protocol completes correctly, only the HAL state machine never advances. This is exactly the failure mode that's hardest to diagnose because the obvious checks (verify ENTDAA returns OK, verify slave gets DA) all pass.
Workaround currently in use
Application-side SR drain in the wait loops (shown above). Works reliably; downside is that every consumer of the controller-side IT API must implement the drain themselves, which is non-obvious and not documented in the HAL header.
Reference
Bench config and full reproducer is documented in §S7.1 of the Embedded Mastery Series — Volume 1 (NUCLEO-H563ZI) lab manual; the relevant portions of the reproducer source live in two STM32CubeIDE projects (NUCLEO_H563ZI_Labs controller, NUCLEO_H563ZI_I3C_Target target) — willing to share if requested.
Expected Behavior
No response
Actual Behavior
No response
Environment
No response
Severity
None
Bug Summary
I3C controller HAL DAA & TX ISRs do not drain Status FIFO — peripheral hangs after every frame
Detailed Description
Environment
Drivers/STM32H5xx_HAL_Driver/Src/stm32h5xx_hal_i3c.c(Copyright 2023)Summary
HAL_I3C_Ctrl_DynAddrAssign_IT()andHAL_I3C_Ctrl_Transmit_IT()(controller-side, IT mode) cause the I3C peripheral to hang after the first wire-level frame. The completion callbacks (HAL_I3C_CtrlDAACpltCallback,HAL_I3C_CtrlTxCpltCallback) never fire, even though the wire-level protocol completes successfully (the target reportsEVENT_ID_DAUand receives data correctly).Root cause: the H5 I3C peripheral writes a per-frame status entry to the Status FIFO (SR) and pauses indefinitely until the application reads it. The HAL ISRs
I3C_Ctrl_DAA_ISRandI3C_Ctrl_TX_ISRread the RX FIFO but never read SR, so the peripheral is permanently blocked.Reproduction steps
I3C1on PB8/PB9 (GPIO_AF3_I3C1).SCLODLowDuration = 0xFF,SCLI2CHighDuration = 0xFF(slow open-drain to give the bench wiring rise-time margin — does not affect the bug, just makes the bus reliable).HAL_I3C_ActivateNotification(&hi3c1, NULL, HAL_I3C_IT_DAUPDIE);at boot, noHAL_I3C_Tgt_Receive_ITuntil afterEVENT_ID_DAUarrives.HAL_I3C_Ctrl_DynAddrAssign_IT(&hi3c1, I3C_ONLY_ENTDAA);then poll fori3c_daa_done(set byHAL_I3C_CtrlDAACpltCallback).Expected behavior
HAL_I3C_CtrlDAACpltCallbackfires within ~1 ms of submit.i3c_daa_doneflips to 1.Actual behavior
i3c_daa_donestays 0 forever (tested with 12 s wait). Diagnostic register reads after the wait show:Callback fire counters confirm:
HAL_I3C_TgtReqDynamicAddrCallbackfires at +0 ms (target's PID arrives, our SetDynAddr runs, target receives DA).HAL_I3C_CtrlDAACpltCallbacknever fires.HAL_I3C_ErrorCallbacknever fires.The target side reports
EVENT_ID_DAUand successfully receives its dynamic address, confirming the wire-level protocol completes through DA assignment.Root cause
The H5 I3C peripheral writes one entry to the Status FIFO after the ENTDAA frame containing PID/BCR/DCR is processed. Per the H5 reference manual §49 the peripheral is designed to pause subsequent protocol steps until the application drains the Status FIFO — this is the
SFNEFflag inI3C_EVR.I3C_Ctrl_DAA_ISR(line ~8201 ofstm32h5xx_hal_i3c.c) reads from the RX FIFO viaLL_I3C_ReceiveData8to extract PID, but never readsI3C1->SR. The Status FIFO entry sits unread → peripheral does not generate the trailingSr+0x7E+Rround → no NACK → no STOP →FCFnever set →HAL_I3C_CtrlDAACpltCallbacknever invoked.Same defect in
I3C_Ctrl_TX_ISRafter a private write:HAL_I3C_CtrlTxCpltCallbacknever fires.Verification of root cause
Adding the following drain to the wait loop in application code resolves the issue completely:
With the drain,
HAL_I3C_CtrlDAACpltCallbackfires within ~1 ms of submit (verified via tick timestamp captured inside the callback). The same drain pattern resolves the TX-complete callback afterHAL_I3C_Ctrl_Transmit_IT. Bench-verified Apr 25 2026 with the two-NUCLEO setup; controller and target both report success and the slave receives the expected payload bytes.Diagnostic progression captured during root-cause analysis:
__WFIin loopThe progression is monotonic in "amount of SR drain": more drain → faster completion. With drain in the tight wait loop, the cycle completes in well under 2 ms (limit of
HAL_GetTickresolution).Suggested upstream fix
Add a Status FIFO drain inside the controller-side ISRs in
stm32h5xx_hal_i3c.c:Same drain belongs in
I3C_Ctrl_TX_ISRand likely the RX/CCC ISRs as well — wherever the controller's status FIFO can accumulate per-frame entries.Alternatively, enabling the SFNEIE interrupt for these IT modes and reading SR inside the ISR on each SFNE event would also work and be more event-driven.
Impact
Any H5 I3C controller application using IT mode will hang on the first ENTDAA broadcast or first private transfer unless the application happens to read SR for unrelated reasons. The DMA mode path in
HAL_I3C_Ctrl_DynAddrAssign_DMAmay be similarly affected — not tested. The bug is silent: no error code is returned, the wire-level protocol completes correctly, only the HAL state machine never advances. This is exactly the failure mode that's hardest to diagnose because the obvious checks (verify ENTDAA returns OK, verify slave gets DA) all pass.Workaround currently in use
Application-side SR drain in the wait loops (shown above). Works reliably; downside is that every consumer of the controller-side IT API must implement the drain themselves, which is non-obvious and not documented in the HAL header.
Reference
Bench config and full reproducer is documented in §S7.1 of the Embedded Mastery Series — Volume 1 (NUCLEO-H563ZI) lab manual; the relevant portions of the reproducer source live in two STM32CubeIDE projects (
NUCLEO_H563ZI_Labscontroller,NUCLEO_H563ZI_I3C_Targettarget) — willing to share if requested.Expected Behavior
No response
Actual Behavior
No response
Environment
No response
Severity
None