fix: split token-aware into default exclusions + optional --economic mode#106
fix: split token-aware into default exclusions + optional --economic mode#106omerarslan0 wants to merge 2 commits intoHKUDS:mainfrom
Conversation
|
Looks nice! Can we make token-aware a new mode? Also, the added content about binary contents made me think if we can have a hacker mode for reverse engineering binary softwares? We can tag that as experimental if we plan to implement that. |
|
I'm working on it. |
|
Thanks for the feedback! Token-aware as a mode: I'd lean towards keeping token budget as the default behavior rather than a separate mode — the whole point of this fix (#105) is that without it, large codebases overflow the context window silently. If we make it opt-in, agents will hit the same issue unless they know to enable it. That said, I can add a Hacker mode for binary RE: Love the idea — decompiler output analysis (Ghidra/IDA/radare2) could open up closed-source software for harness generation. I think this deserves its own PR though, since the implementation is substantially different from source code analysis. I'll open an issue tagged |
|
For Token-aware as a mode: Thanks! I see your point. I do agree that large codebases overflow the context window silently, but to me it's more like an economical/model capability issue rather than the design of CLI-Anything. Generally when I'm thinking of this, I break things into several points:
Let me know if these make sense and how you think! |
|
That makes sense — I think you're right that the token budget rules are too prescriptive for a general-purpose harness. The 50K limit and 100KB file cap made sense as a safety net, but they'd age badly as context windows grow. Here's what I'll do: Keep in default mode (all modes):
Move to a separate
This way the default mode still skips files that are genuinely useless (binaries, build cache, media) but doesn't artificially cap how much source code the agent can read. The economic mode is there for users on smaller context windows or who want faster/cheaper runs. I'll update the PR with this split. Does that align with what you had in mind? @yuh-yang |
|
That'd make a lot of sense to me! Thanks for the discussion. |
|
Updated. Default mode: binary detection, binary/build/media/vendor exclusion, scan-before-read principle.
Ready for review. |
|
@yuh-yang check this. |
|
hmmm I only see the criterion for token-aware mode? And they're directly overwriting on the current file. |
8cfc9de to
eba2516
Compare
|
Updated — split is done now. Default Phase 1 only has binary detection and exclusion rules (binaries, build artifacts, media, vendored, generated). Token budget stuff (50K cap, 100KB file limit, priority ordering, summarize-don't-dump) moved to a separate --economic section after Phase 7. cli-anything.md and refine.md cleaned up too, just reference --economic flag instead of hardcoding numbers. |
Summary
Splits the token-aware codebase analysis changes into two tiers based on maintainer feedback:
Default mode (always active):
.app,.exe,.AppImage,.msi,.deb,.rpm,.flatpak) with abort message.o,.so,.dylib,.a,.exe,.dll,.wasm,.pyc,.class,.jar)build/,dist/,__pycache__/,node_modules/,.git/,target/,cmake-build*/)*_generated.*,*.pb.go,*.pb.cc)vendor/,third_party/)--economicmode (opt-in):For users on smaller context windows or who want faster/cheaper generation runs.
Changes
HARNESS.md: Split Phase 1 into default exclusions + new--economicsectioncli-anything.md: Remove token budget numbers from default Phase 1, add--economicreferencerefine.md: Remove token budget numbers from Step 2, add--economicreferenceTest plan
--economicsection contains all token budget constraints--economicflagCloses #105