Skip to content

InMemoryOrchestrationService: External events not delivered when durable timer is pending in Task.WhenAny #713

@send2vinnie

Description

@send2vinnie

Description

When using DurableTaskTestHost (from Microsoft.DurableTask.InProcessTestHost), external events raised via RaiseEventAsync are not properly delivered to an orchestration that is waiting on Task.WhenAny(timerTask, externalEventTask). The orchestration hangs indefinitely instead of completing when the event arrives before the timer fires.

This makes it impossible to write integration tests for any orchestrator pattern that combines durable timers with external events (e.g., wait for approval with timeout).

Versions affected

  • Microsoft.DurableTask.InProcessTestHost 0.2.1-preview.1 through 0.2.3-preview.1 (latest as of this report)

Root cause analysis

The issue is in InMemoryInstanceStore.AddMessage() in InMemoryOrchestrationService.cs.

When an orchestrator creates a durable timer, the resulting TimerFiredEvent message has a future scheduledTime. In AddMessage(), this triggers the delayed path:

if (TryGetScheduledTime(message, out TimeSpan delay))
{
    // Not ready for this message yet - delay the enqueue
    _ = Task.Delay(delay).ContinueWith(t => this.AddMessage(message));
    return;  // exits without adding to MessagesJson
}

The sequence that causes the hang:

  1. Orchestrator runs, creates a TimerCreatedEvent, and suspends. ReleaseLock sets state.IsLoaded = false.
  2. The TimerFiredEvent message enters AddMessage() but is deferred via Task.Delay — it returns immediately without adding to MessagesJson.
  3. An external event (EventRaisedEvent) arrives via RaiseEventAsyncAddMessage(). Since state.IsLoaded == false and no scheduled delay, the event is added to MessagesJson and the instance is scheduled to run.
  4. The orchestrator replays, sees the EventRaisedEvent in the new messages, but the TimerFiredEvent hasn't arrived yet. The Durable Task Framework's Task.WhenAny implementation needs both tasks to be representable in the history to resolve. Since the timer hasn't fired, the framework re-suspends the orchestrator.
  5. ReleaseLock is called — state.IsLoaded = false, MessagesJson was cleared (line 572), so the event message is gone.
  6. When the timer finally fires and AddMessage is called again, the orchestrator replays but the external event message was already consumed and cleared. The orchestrator suspends again waiting for an event that was already delivered and discarded.

Reproduction

[Fact]
public async Task ExternalEvent_WithTimer_ShouldComplete()
{
    await using var host = await DurableTaskTestHost.StartAsync(tasks =>
    {
        tasks.AddOrchestrator("TestOrchestrator", async ctx =>
        {
            using var cts = ctx.CreateTimer(TimeSpan.FromMinutes(5), CancellationToken.None);
            var eventTask = ctx.WaitForExternalEvent<string>("MyEvent");
            var timerTask = cts.Task;
            var winner = await Task.WhenAny(eventTask, timerTask);
            return winner == eventTask ? "event" : "timeout";
        });
    });

    string instanceId = "test-instance";
    await host.Client.ScheduleNewOrchestrationInstanceAsync(
        "TestOrchestrator",
        new StartOrchestrationOptions { InstanceId = instanceId });

    await Task.Delay(TimeSpan.FromSeconds(3));
    await host.Client.RaiseEventAsync(instanceId, "MyEvent", "hello");

    using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
    var metadata = await host.Client.WaitForInstanceCompletionAsync(
        instanceId, getInputsAndOutputs: true, cts.Token);

    // FAILS: OperationCanceledException — orchestration never completes
    Assert.Equal(OrchestrationRuntimeStatus.Completed, metadata.RuntimeStatus);
    Assert.Equal("event", metadata.ReadOutputAs<string>());
}

Expected behavior

The orchestration should complete immediately when the external event arrives, with the event task winning the Task.WhenAny.

Actual behavior

The orchestration hangs indefinitely. After the 30-second cancellation token fires, WaitForInstanceCompletionAsync throws OperationCanceledException.

Suggested fix

When an external event arrives for an instance that has pending deferred timer messages, the instance should still be able to process the event and complete. One possible approach: instead of deferring timer messages with Task.Delay + re-enqueue, add them to MessagesJson immediately and let the orchestrator dispatcher check scheduledTime before delivering them to the orchestrator.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions