Fix LLM agent stability: conditional execution, error handling, tool registration#268
Fix LLM agent stability: conditional execution, error handling, tool registration#268
Conversation
…registration - BackgroundServices (TelegramTaskConsumer, ChunkPollingService, AgentRegistryService) now early-return when EnableLLMAgentProcess is disabled, preventing unnecessary Redis calls and potential crashes from RedisTimeoutException - Agent process BRPOP timeout reduced from 5s to 2s to avoid race with SE.Redis async timeout (same fix as PR #267 for the main process) - AgentLoopService main loop now catches RedisException with retry delay, preventing a single transient Redis failure from crashing the agent process - Heartbeat loop catches OperationCanceledException and RedisException, preventing unobserved exceptions during shutdown or transient failures - Agent process now registers LLM project tools (FileToolService, BashToolService) via two-assembly McpToolHelper.EnsureInitialized, giving the agent access to file operations and shell execution instead of only echo/calculator/send_message Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📝 WalkthroughWalkthroughChanges add MCP tool service integration (file and bash tools) to the agent startup, implement conditional execution guards based on Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs (2)
55-57: Consider logging the disabled-mode early return for parity with sibling services.
TelegramTaskConsumerandAgentRegistryServiceboth emit aLogDebugwhen returning early; this service silently returns, making it harder to tell from logs whether the polling loop was intentionally skipped.ChunkPollingServicecurrently has noILoggerinjected — injecting one would also let you log the swallowedRedisExceptionon line 64.Suggested change
- public ChunkPollingService(IConnectionMultiplexer redis) { + private readonly ILogger<ChunkPollingService> _logger; + + public ChunkPollingService(IConnectionMultiplexer redis, ILogger<ChunkPollingService> logger) { _redis = redis; + _logger = logger; } @@ - if (!Env.EnableLLMAgentProcess) { - return; - } + if (!Env.EnableLLMAgentProcess) { + _logger.LogDebug("LLM agent process mode disabled – ChunkPollingService will not start"); + return; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs` around lines 55 - 57, ChunkPollingService currently returns silently when Env.EnableLLMAgentProcess is false and swallows RedisException; inject an ILogger<ChunkPollingService> into the constructor (store as _logger), use _logger.LogDebug(...) to emit a message when the early return occurs in the Start/loop entry (the if (!Env.EnableLLMAgentProcess) block), and update the catch that currently swallows RedisException (around the RedisException at line ~64) to call _logger.LogError(ex, "Redis error in ChunkPollingService") so the exception is visible in logs while preserving current behavior.
64-66: Silently swallowingRedisExceptionhides transient failures.Without a logger, repeated Redis outages produce no signal and there's also no back-off delay before the next
RunPollCycleAsyncattempt — the loop only delays viaTask.Delayon line 69, which still runs. That's probably fine given the poll interval, but at minimum aLogWarningwould help diagnose flapping Redis issues. (Related to the logger-injection suggestion above.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs` around lines 64 - 66, In ChunkPollingService's RunPollCycleAsync catch block that currently swallows RedisException, log the exception (use the injected logger instance on the class) with a warning message that includes the exception details and context (e.g., "Transient Redis failure during RunPollCycleAsync"); optionally add a short back-off delay before retrying (e.g., a small Task.Delay or incremental back-off) so repeated transient Redis outages are visible in logs and give Redis a moment before the next attempt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TelegramSearchBot/Service/AI/LLM/AgentRegistryService.cs`:
- Around line 186-189: The early return when Env.EnableLLMAgentProcess is false
prevents ExecuteAsync from ever calling RunMaintenanceOnceAsync and leaves
previously-known agents running; update the AgentRegistryService start path so
that before returning when Env.EnableLLMAgentProcess is false you perform one
final shutdown pass: either call await
RunMaintenanceOnceAsync(CancellationToken.None) or iterate over _knownSessions
and call RequestShutdownAsync("agent mode disabled") for each (awaiting the
tasks) so existing sessions are asked to terminate gracefully; keep references
to Env.EnableLLMAgentProcess, ExecuteAsync, RunMaintenanceOnceAsync,
_knownSessions and RequestShutdownAsync when making the change.
---
Nitpick comments:
In `@TelegramSearchBot/Service/AI/LLM/ChunkPollingService.cs`:
- Around line 55-57: ChunkPollingService currently returns silently when
Env.EnableLLMAgentProcess is false and swallows RedisException; inject an
ILogger<ChunkPollingService> into the constructor (store as _logger), use
_logger.LogDebug(...) to emit a message when the early return occurs in the
Start/loop entry (the if (!Env.EnableLLMAgentProcess) block), and update the
catch that currently swallows RedisException (around the RedisException at line
~64) to call _logger.LogError(ex, "Redis error in ChunkPollingService") so the
exception is visible in logs while preserving current behavior.
- Around line 64-66: In ChunkPollingService's RunPollCycleAsync catch block that
currently swallows RedisException, log the exception (use the injected logger
instance on the class) with a warning message that includes the exception
details and context (e.g., "Transient Redis failure during RunPollCycleAsync");
optionally add a short back-off delay before retrying (e.g., a small Task.Delay
or incremental back-off) so repeated transient Redis outages are visible in logs
and give Redis a moment before the next attempt.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9b85fde7-5009-4c67-a150-7bd1755e5257
📒 Files selected for processing (5)
TelegramSearchBot.LLMAgent/LLMAgentProgram.csTelegramSearchBot.LLMAgent/Service/AgentLoopService.csTelegramSearchBot/Service/AI/LLM/AgentRegistryService.csTelegramSearchBot/Service/AI/LLM/ChunkPollingService.csTelegramSearchBot/Service/AI/LLM/TelegramTaskConsumer.cs
| if (!Env.EnableLLMAgentProcess) { | ||
| _logger.LogDebug("LLM agent process mode disabled – AgentRegistryService will not start"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Disabled-mode shutdown branch in RunMaintenanceOnceAsync becomes unreachable from the background loop.
With this early return, ExecuteAsync never calls RunMaintenanceOnceAsync, so the disabled-mode branch at lines 149–153 (which iterates _knownSessions and calls RequestShutdownAsync("agent mode disabled")) no longer fires from the hosted service. In practice this is mostly fine because EnsureAgentAsync throws when the flag is false so _knownSessions shouldn't grow, but if the flag is toggled from true→false at runtime, previously-running agents will no longer be gracefully asked to shut down from here.
Consider either:
- Documenting that toggling requires a restart, or
- Running one final
RunMaintenanceOnceAsyncpass (or just the shutdown-known-sessions block) before returning, so in-flight sessions get a graceful shutdown request.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TelegramSearchBot/Service/AI/LLM/AgentRegistryService.cs` around lines 186 -
189, The early return when Env.EnableLLMAgentProcess is false prevents
ExecuteAsync from ever calling RunMaintenanceOnceAsync and leaves
previously-known agents running; update the AgentRegistryService start path so
that before returning when Env.EnableLLMAgentProcess is false you perform one
final shutdown pass: either call await
RunMaintenanceOnceAsync(CancellationToken.None) or iterate over _knownSessions
and call RequestShutdownAsync("agent mode disabled") for each (awaiting the
tasks) so existing sessions are asked to terminate gracefully; keep references
to Env.EnableLLMAgentProcess, ExecuteAsync, RunMaintenanceOnceAsync,
_knownSessions and RequestShutdownAsync when making the change.
🔍 PR检查报告📋 检查概览
🧪 测试结果
📊 代码质量
📁 测试产物
🔗 相关链接此报告由GitHub Actions自动生成 |
Summary
Fixes multiple stability issues in the LLM agent process separation implementation.
Changes
1. Conditional BackgroundService execution (prevents crash when agent mode disabled)
2. Agent BRPOP timeout fix
3. Agent error handling
4. Agent tool registration
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Improvements