Fix LLM agent stability: conditional execution, error handling, tool registration by ModerRAS · Pull Request #268 · ModerRAS/TelegramSearchBot

ModerRAS · 2026-04-17T06:39:52Z

Summary

Fixes multiple stability issues in the LLM agent process separation implementation.

Changes

1. Conditional BackgroundService execution (prevents crash when agent mode disabled)

\TelegramTaskConsumer, \ChunkPollingService, and \AgentRegistryService\ now early-return from \ExecuteAsync()\ when \EnableLLMAgentProcess=false\ (the default)
Previously these services would make Redis BRPOP/HGETALL/ListRange calls every loop iteration regardless of config, causing \RedisTimeoutException\ → \StopHost\ → app crash

2. Agent BRPOP timeout fix

Changed \BRPopAsync\ timeout from 5s → 2s in the agent loop, matching the main process fix from PR Fix Redis BRPOP timeout crashing background services #267
Prevents race condition with SE.Redis 5s async timeout

3. Agent error handling

\AgentLoopService\ main loop now catches \RedisException\ with 1s retry delay
Heartbeat loop catches \OperationCanceledException\ and \RedisException\
Prevents single transient Redis failures from crashing the agent process

4. Agent tool registration

Changed \McpToolHelper.EnsureInitialized\ to use two-assembly overload, registering both \AgentToolService\ and LLM project tools
Added \FileToolService\ (read/write/edit/search/list files) and \BashToolService\ (shell execution) to agent DI
Agent now has 8+ useful tools instead of only 3 trivial ones (echo/calculator/send_message)

Testing

All 413 tests pass (223 + 186 + 4)
Build: 0 errors

Summary by CodeRabbit

New Features
- Added file and bash tool capabilities to the LLM agent.
Bug Fixes
- Improved service resilience with graceful error recovery and exception handling.
- Enhanced cancellation handling for more responsive background operations.
Improvements
- Service execution now respects LLM agent process configuration settings.

…registration - BackgroundServices (TelegramTaskConsumer, ChunkPollingService, AgentRegistryService) now early-return when EnableLLMAgentProcess is disabled, preventing unnecessary Redis calls and potential crashes from RedisTimeoutException - Agent process BRPOP timeout reduced from 5s to 2s to avoid race with SE.Redis async timeout (same fix as PR #267 for the main process) - AgentLoopService main loop now catches RedisException with retry delay, preventing a single transient Redis failure from crashing the agent process - Heartbeat loop catches OperationCanceledException and RedisException, preventing unobserved exceptions during shutdown or transient failures - Agent process now registers LLM project tools (FileToolService, BashToolService) via two-assembly McpToolHelper.EnsureInitialized, giving the agent access to file operations and shell execution instead of only echo/calculator/send_message Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-04-17T06:40:07Z

📝 Walkthrough

Walkthrough

Changes add MCP tool service integration (file and bash tools) to the agent startup, implement conditional execution guards based on Env.EnableLLMAgentProcess across multiple services, improve Redis exception handling with logging and graceful retry patterns in agent loops, and reduce BRPOP blocking timeout from 5 to 2 seconds.

Changes

Cohort / File(s)	Summary
Tool Service Integration `TelegramSearchBot.LLMAgent/LLMAgentProgram.cs`	Initializes MCP/tools support by expanding assembly discovery to include `FileToolService`, registers new scoped bindings for `IFileToolService` and `IBashToolService` in dependency injection.
Redis Resilience & Timeouts `TelegramSearchBot.LLMAgent/Service/AgentLoopService.cs`	Wraps Redis blocking pop and session save operations in `try/catch` blocks to handle `RedisException` gracefully with logging and retry logic; reduces BRPOP wait timeout from 5 to 2 seconds; adds `OperationCanceledException` handling for heartbeat shutdown.
Feature Flag Guards `TelegramSearchBot/Service/AI/LLM/AgentRegistryService.cs`, `TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs`, `TelegramSearchBot/Service/AI/LLM/TelegramTaskConsumer.cs`	Adds early exit checks for `Env.EnableLLMAgentProcess` in `ExecuteAsync` methods; wraps polling cycles in exception handling to tolerate `RedisException` while respecting `OperationCanceledException` for proper shutdown.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Fix Redis BRPOP timeout crashing background services #267: Modifies the same agent/consumer services with similar Redis exception handling patterns, reduces BRPOP timeout identically, and improves background-service error resilience.

Poem

🐰 Hop, hop! New tools await,
With redis guards that seal the gate,
Feature flags now spring so true,
Two-second waits, no five—we flew!
Errors caught like carrots sweet, 🥕
Making services complete!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and comprehensively summarizes the three main categories of changes: conditional execution guards, error handling improvements, and tool registration enhancements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/llm-agent-stability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs (2)

55-57: Consider logging the disabled-mode early return for parity with sibling services.

TelegramTaskConsumer and AgentRegistryService both emit a LogDebug when returning early; this service silently returns, making it harder to tell from logs whether the polling loop was intentionally skipped. ChunkPollingService currently has no ILogger injected — injecting one would also let you log the swallowed RedisException on line 64.

Suggested change

-        public ChunkPollingService(IConnectionMultiplexer redis) {
+        private readonly ILogger<ChunkPollingService> _logger;
+
+        public ChunkPollingService(IConnectionMultiplexer redis, ILogger<ChunkPollingService> logger) {
             _redis = redis;
+            _logger = logger;
         }
@@
-            if (!Env.EnableLLMAgentProcess) {
-                return;
-            }
+            if (!Env.EnableLLMAgentProcess) {
+                _logger.LogDebug("LLM agent process mode disabled – ChunkPollingService will not start");
+                return;
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs` around lines 55 -
57, ChunkPollingService currently returns silently when
Env.EnableLLMAgentProcess is false and swallows RedisException; inject an
ILogger<ChunkPollingService> into the constructor (store as _logger), use
_logger.LogDebug(...) to emit a message when the early return occurs in the
Start/loop entry (the if (!Env.EnableLLMAgentProcess) block), and update the
catch that currently swallows RedisException (around the RedisException at line
~64) to call _logger.LogError(ex, "Redis error in ChunkPollingService") so the
exception is visible in logs while preserving current behavior.

64-66: Silently swallowing RedisException hides transient failures.

Without a logger, repeated Redis outages produce no signal and there's also no back-off delay before the next RunPollCycleAsync attempt — the loop only delays via Task.Delay on line 69, which still runs. That's probably fine given the poll interval, but at minimum a LogWarning would help diagnose flapping Redis issues. (Related to the logger-injection suggestion above.)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs` around lines 64 -
66, In ChunkPollingService's RunPollCycleAsync catch block that currently
swallows RedisException, log the exception (use the injected logger instance on
the class) with a warning message that includes the exception details and
context (e.g., "Transient Redis failure during RunPollCycleAsync"); optionally
add a short back-off delay before retrying (e.g., a small Task.Delay or
incremental back-off) so repeated transient Redis outages are visible in logs
and give Redis a moment before the next attempt.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TelegramSearchBot/Service/AI/LLM/AgentRegistryService.cs`:
- Around line 186-189: The early return when Env.EnableLLMAgentProcess is false
prevents ExecuteAsync from ever calling RunMaintenanceOnceAsync and leaves
previously-known agents running; update the AgentRegistryService start path so
that before returning when Env.EnableLLMAgentProcess is false you perform one
final shutdown pass: either call await
RunMaintenanceOnceAsync(CancellationToken.None) or iterate over _knownSessions
and call RequestShutdownAsync("agent mode disabled") for each (awaiting the
tasks) so existing sessions are asked to terminate gracefully; keep references
to Env.EnableLLMAgentProcess, ExecuteAsync, RunMaintenanceOnceAsync,
_knownSessions and RequestShutdownAsync when making the change.

---

Nitpick comments:
In `@TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs`:
- Around line 55-57: ChunkPollingService currently returns silently when
Env.EnableLLMAgentProcess is false and swallows RedisException; inject an
ILogger<ChunkPollingService> into the constructor (store as _logger), use
_logger.LogDebug(...) to emit a message when the early return occurs in the
Start/loop entry (the if (!Env.EnableLLMAgentProcess) block), and update the
catch that currently swallows RedisException (around the RedisException at line
~64) to call _logger.LogError(ex, "Redis error in ChunkPollingService") so the
exception is visible in logs while preserving current behavior.
- Around line 64-66: In ChunkPollingService's RunPollCycleAsync catch block that
currently swallows RedisException, log the exception (use the injected logger
instance on the class) with a warning message that includes the exception
details and context (e.g., "Transient Redis failure during RunPollCycleAsync");
optionally add a short back-off delay before retrying (e.g., a small Task.Delay
or incremental back-off) so repeated transient Redis outages are visible in logs
and give Redis a moment before the next attempt.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b85fde7-5009-4c67-a150-7bd1755e5257

📥 Commits

Reviewing files that changed from the base of the PR and between cf98863 and 3c468eb.

📒 Files selected for processing (5)

TelegramSearchBot.LLMAgent/LLMAgentProgram.cs
TelegramSearchBot.LLMAgent/Service/AgentLoopService.cs
TelegramSearchBot/Service/AI/LLM/AgentRegistryService.cs
TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs
TelegramSearchBot/Service/AI/LLM/TelegramTaskConsumer.cs

coderabbitai · 2026-04-17T06:43:07Z

+            if (!Env.EnableLLMAgentProcess) {
+                _logger.LogDebug("LLM agent process mode disabled – AgentRegistryService will not start");
+                return;
+            }


⚠️ Potential issue | 🟡 Minor

Disabled-mode shutdown branch in RunMaintenanceOnceAsync becomes unreachable from the background loop.

With this early return, ExecuteAsync never calls RunMaintenanceOnceAsync, so the disabled-mode branch at lines 149–153 (which iterates _knownSessions and calls RequestShutdownAsync("agent mode disabled")) no longer fires from the hosted service. In practice this is mostly fine because EnsureAgentAsync throws when the flag is false so _knownSessions shouldn't grow, but if the flag is toggled from true→false at runtime, previously-running agents will no longer be gracefully asked to shut down from here.

Consider either:

Documenting that toggling requires a restart, or

Running one final RunMaintenanceOnceAsync pass (or just the shutdown-known-sessions block) before returning, so in-flight sessions get a graceful shutdown request.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@TelegramSearchBot/Service/AI/LLM/AgentRegistryService.cs` around lines 186 - 189, The early return when Env.EnableLLMAgentProcess is false prevents ExecuteAsync from ever calling RunMaintenanceOnceAsync and leaves previously-known agents running; update the AgentRegistryService start path so that before returning when Env.EnableLLMAgentProcess is false you perform one final shutdown pass: either call await RunMaintenanceOnceAsync(CancellationToken.None) or iterate over _knownSessions and call RequestShutdownAsync("agent mode disabled") for each (awaiting the tasks) so existing sessions are asked to terminate gracefully; keep references to Env.EnableLLMAgentProcess, ExecuteAsync, RunMaintenanceOnceAsync, _knownSessions and RequestShutdownAsync when making the change.

github-actions · 2026-04-17T06:44:45Z

🔍 PR检查报告

📋 检查概览

PR: Fix LLM agent stability: conditional execution, error handling, tool registration #268
分支: fix/llm-agent-stability → master
触发事件: pull_request
提交: 88bd8d4

🧪 测试结果

平台	状态	详情
Ubuntu	🔴 失败	测试结果不可用
Windows	🔴 失败	测试结果不可用

📊 代码质量

✅ 代码格式化检查
✅ 安全漏洞扫描
✅ 依赖包分析
✅ 代码覆盖率收集

📁 测试产物

测试结果文件已上传为artifacts
代码覆盖率已上传到Codecov

🔗 相关链接

此报告由GitHub Actions自动生成

coderabbitai bot reviewed Apr 17, 2026

View reviewed changes

ModerRAS merged commit 2680c95 into master Apr 17, 2026
8 checks passed

ModerRAS deleted the fix/llm-agent-stability branch April 17, 2026 22:16

coderabbitai bot mentioned this pull request Apr 18, 2026

fix: MCP tool refresh, final message delivery, and overflow retry #270

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LLM agent stability: conditional execution, error handling, tool registration#268

Fix LLM agent stability: conditional execution, error handling, tool registration#268
ModerRAS merged 1 commit intomasterfrom
fix/llm-agent-stability

ModerRAS commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ModerRAS commented Apr 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 17, 2026

🔍 PR检查报告

📋 检查概览

🧪 测试结果

📊 代码质量

📁 测试产物

🔗 相关链接

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ModerRAS commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 17, 2026 •

edited

Loading