Skip to content

fix: Fix chunked prefill#37

Merged
YdrMaster merged 2 commits intoInfiniTensor:llama.cu-devfrom
pwhMass:fix_chunked_prefill
Jun 11, 2025
Merged

fix: Fix chunked prefill#37
YdrMaster merged 2 commits intoInfiniTensor:llama.cu-devfrom
pwhMass:fix_chunked_prefill

Conversation

@pwhMass
Copy link
Copy Markdown
Contributor

@pwhMass pwhMass commented Jun 11, 2025

修复chunked prefill,使其支持每批次tokens总数限制
但由于engine_manager中session的储存使用的是BTreeMap,
所以目前的session处理优先级是session id小优先处理,小sessionid没处理完不会处理大sessionid
如果需要更改处理优先级需要更换其他数据结构

同时修复(兼容 DeepSeek reasoning 模型协议)提交导致的bug

pwhMass added 2 commits June 12, 2025 04:00
但由于engine_manager中session的储存使用的是BTreeMap,
所以目前的session处理优先级是session id小优先处理,小sessionid没处理完不会处理大sessionid
如果需要更改处理优先级需要更换其他数据结构
@YdrMaster YdrMaster merged commit b11aa15 into InfiniTensor:llama.cu-dev Jun 11, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants