[Bug]: I3C controller IT mode hangs: HAL ISRs don't drain Status FIFO

### Bug Summary

I3C controller HAL DAA & TX ISRs do not drain Status FIFO — peripheral hangs after every frame

### Detailed Description

## Environment

- **MCU**: STM32H563ZIT6
- **Board**: NUCLEO-H563ZI (MB1404 Rev C-01) ×2 (one controller, one target)
- **Firmware pack**: STM32CubeH5 FW.H5.1.6.0
- **HAL file**: `Drivers/STM32H5xx_HAL_Driver/Src/stm32h5xx_hal_i3c.c` (Copyright 2023)
- **Toolchain**: STM32CubeIDE (arm-none-eabi-gcc, -O0 -g3)
- **Reproducer**: minimal two-board ENTDAA + private write between two NUCLEO-H563ZI

## Summary

`HAL_I3C_Ctrl_DynAddrAssign_IT()` and `HAL_I3C_Ctrl_Transmit_IT()` (controller-side, IT mode) cause the I3C peripheral to hang after the first wire-level frame. The completion callbacks (`HAL_I3C_CtrlDAACpltCallback`, `HAL_I3C_CtrlTxCpltCallback`) never fire, even though the wire-level protocol completes successfully (the target reports `EVENT_ID_DAU` and receives data correctly).

Root cause: the H5 I3C peripheral writes a per-frame status entry to the Status FIFO (SR) and **pauses indefinitely** until the application reads it. The HAL ISRs `I3C_Ctrl_DAA_ISR` and `I3C_Ctrl_TX_ISR` read the RX FIFO but never read SR, so the peripheral is permanently blocked.

## Reproduction steps

1. Configure two NUCLEO-H563ZI boards, both with `I3C1` on PB8/PB9 (`GPIO_AF3_I3C1`).
2. Apply ES0565 §2.15.3 PB8/PB9 internal pull-up workaround on both boards (separately reported / acknowledged in errata).
3. Add 4.7 kΩ external pull-ups on SDA, SCL to 3.3 V. Connect controller PB8↔target PB8, PB9↔PB9, GND↔GND.
4. Configure controller bus characteristics: `SCLODLowDuration = 0xFF`, `SCLI2CHighDuration = 0xFF` (slow open-drain to give the bench wiring rise-time margin — does not affect the bug, just makes the bus reliable).
5. Target firmware: `HAL_I3C_ActivateNotification(&hi3c1, NULL, HAL_I3C_IT_DAUPDIE);` at boot, no `HAL_I3C_Tgt_Receive_IT` until after `EVENT_ID_DAU` arrives.
6. Controller firmware: call `HAL_I3C_Ctrl_DynAddrAssign_IT(&hi3c1, I3C_ONLY_ENTDAA);` then poll for `i3c_daa_done` (set by `HAL_I3C_CtrlDAACpltCallback`).

## Expected behavior

`HAL_I3C_CtrlDAACpltCallback` fires within ~1 ms of submit. `i3c_daa_done` flips to 1.

## Actual behavior

`i3c_daa_done` stays 0 forever (tested with 12 s wait). Diagnostic register reads after the wait show:

```
state = HAL_I3C_STATE_BUSY_DAA (0x24)
EVR   = 0x0000000B   (CFEF | TXFEF | SFNEF — Status FIFO has 1 unread entry)
EVR.FCF = 0          (no Frame Complete event)
SER   = 0            (no errors)
IER   = 0x00000A14   (FCIE/CFNFIE/TXFNFIE/ERRIE all correctly enabled)
```

Callback fire counters confirm:
- `HAL_I3C_TgtReqDynamicAddrCallback` fires at +0 ms (target's PID arrives, our SetDynAddr runs, target receives DA).
- `HAL_I3C_CtrlDAACpltCallback` never fires.
- `HAL_I3C_ErrorCallback` never fires.

The target side reports `EVENT_ID_DAU` and successfully receives its dynamic address, confirming the wire-level protocol completes through DA assignment.

## Root cause

The H5 I3C peripheral writes one entry to the Status FIFO after the ENTDAA frame containing PID/BCR/DCR is processed. Per the H5 reference manual §49 the peripheral is designed to pause subsequent protocol steps until the application drains the Status FIFO — this is the `SFNEF` flag in `I3C_EVR`.

`I3C_Ctrl_DAA_ISR` (line ~8201 of `stm32h5xx_hal_i3c.c`) reads from the RX FIFO via `LL_I3C_ReceiveData8` to extract PID, but never reads `I3C1->SR`. The Status FIFO entry sits unread → peripheral does not generate the trailing `Sr+0x7E+R` round → no NACK → no STOP → `FCF` never set → `HAL_I3C_CtrlDAACpltCallback` never invoked.

Same defect in `I3C_Ctrl_TX_ISR` after a private write: `HAL_I3C_CtrlTxCpltCallback` never fires.

## Verification of root cause

Adding the following drain to the wait loop in application code resolves the issue completely:

```c
while (i3c_daa_done == 0 && (HAL_GetTick() - t0) < 1000) {
    while ((I3C1->EVR & I3C_EVR_SFNEF) != 0) {
        (void)I3C1->SR;
    }
}
```

With the drain, `HAL_I3C_CtrlDAACpltCallback` fires within ~1 ms of submit (verified via tick timestamp captured inside the callback). The same drain pattern resolves the TX-complete callback after `HAL_I3C_Ctrl_Transmit_IT`. Bench-verified Apr 25 2026 with the two-NUCLEO setup; controller and target both report success and the slave receives the expected payload bytes.

Diagnostic progression captured during root-cause analysis:

| Test condition | DAA-cplt callback fires at | i3c_daa_done snap |
|---|---|---|
| 5 s wait, no SR drain | never (timed out) | 0 |
| 10 s wait + 200 ms settle, no drain, `__WFI` in loop | +10200 ms (right at wait expiry — coincidence with diagnostic SR drain after the wait) | 0 |
| 12 s polling loop (1 s steps), no SR drain | +12096 ms (right at loop exit, same coincidence) | 0 |
| 12 s polling loop with SR drain inside each iteration | **+1001 ms** (right after first drain) | **1** |
| 1 s tight wait with SR drain | **+0 ms** | **1** |

The progression is monotonic in "amount of SR drain": more drain → faster completion. With drain in the tight wait loop, the cycle completes in well under 2 ms (limit of `HAL_GetTick` resolution).

## Suggested upstream fix

Add a Status FIFO drain inside the controller-side ISRs in `stm32h5xx_hal_i3c.c`:

```c
/* In I3C_Ctrl_DAA_ISR, after the existing TXFNFF/RXFNEF handling: */
while (LL_I3C_IsActiveFlag_SFNE(hi3c->Instance) != 0U)
{
    (void)LL_I3C_GetRxStatus(hi3c->Instance);  /* or equivalent SR read */
}
```

Same drain belongs in `I3C_Ctrl_TX_ISR` and likely the RX/CCC ISRs as well — wherever the controller's status FIFO can accumulate per-frame entries.

Alternatively, enabling the SFNEIE interrupt for these IT modes and reading SR inside the ISR on each SFNE event would also work and be more event-driven.

## Impact

Any H5 I3C controller application using IT mode will hang on the first ENTDAA broadcast or first private transfer unless the application happens to read SR for unrelated reasons. The DMA mode path in `HAL_I3C_Ctrl_DynAddrAssign_DMA` may be similarly affected — not tested. The bug is silent: no error code is returned, the wire-level protocol completes correctly, only the HAL state machine never advances. This is exactly the failure mode that's hardest to diagnose because the obvious checks (verify ENTDAA returns OK, verify slave gets DA) all pass.

## Workaround currently in use

Application-side SR drain in the wait loops (shown above). Works reliably; downside is that every consumer of the controller-side IT API must implement the drain themselves, which is non-obvious and not documented in the HAL header.

## Reference

Bench config and full reproducer is documented in §S7.1 of the *Embedded Mastery Series — Volume 1 (NUCLEO-H563ZI)* lab manual; the relevant portions of the reproducer source live in two STM32CubeIDE projects (`NUCLEO_H563ZI_Labs` controller, `NUCLEO_H563ZI_I3C_Target` target) — willing to share if requested.


### Expected Behavior

_No response_

### Actual Behavior

_No response_

### Environment

_No response_

### Severity

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: I3C controller IT mode hangs: HAL ISRs don't drain Status FIFO #33

Bug Summary

Detailed Description

Environment

Summary

Reproduction steps

Expected behavior

Actual behavior

Root cause

Verification of root cause

Suggested upstream fix

Impact

Workaround currently in use

Reference

Expected Behavior

Actual Behavior

Environment

Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test condition	DAA-cplt callback fires at	i3c_daa_done snap
5 s wait, no SR drain	never (timed out)	0
10 s wait + 200 ms settle, no drain, `__WFI` in loop	+10200 ms (right at wait expiry — coincidence with diagnostic SR drain after the wait)	0
12 s polling loop (1 s steps), no SR drain	+12096 ms (right at loop exit, same coincidence)	0
12 s polling loop with SR drain inside each iteration	+1001 ms (right after first drain)	1
1 s tight wait with SR drain	+0 ms	1

[Bug]: I3C controller IT mode hangs: HAL ISRs don't drain Status FIFO #33

Description

Bug Summary

Detailed Description

Environment

Summary

Reproduction steps

Expected behavior

Actual behavior

Root cause

Verification of root cause

Suggested upstream fix

Impact

Workaround currently in use

Reference

Expected Behavior

Actual Behavior

Environment

Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions