Inspiration
Standing on the shoulders of sharp thinkers.
Reuven Cohen
Agentic Engineer · Founder @ Cognitum.One
“The going rate for a single developer running Claude Code using a swarm style development is around $2.5k/day or $75k/month via Anthropic enterprise API.”
“The biggest cost in agentic development isn't the model. It's the constant replay of context. Most autonomous coding systems keep resending the same architecture documents, ADRs, source files, tool definitions, and conversation history over and over. That's where the money goes.”
“We're not optimizing models. We're optimizing information flow.”

Reuven's post articulated the problem precisely. Agent Booster is our open-source implementation of that insight — AST-level symbol routing, semantic vector search, and MCP integration built directly into your coding workflow. The concept of operating at the AST and semantic level rather than treating code as raw text comes directly from this framing.
Quickstart
Up and running in two commands.
Step 1 — Install
# includes embeddings + file watcher
pip install agent-booster[full]
Step 2 — Start
# detects Claude/Cursor/Codex, wires hooks, indexes, starts daemon
booster start
Detects which AI tools are present (Claude Code, Cursor, Windsurf, Codex), wires each one automatically, indexes the project, and starts a background daemon that keeps the model warm and auto-re-indexes on every file save. Fully reversible with booster remove claude.
That's it — then track savings
booster gain
Compatibility
Works with every major AI coding tool.
Claude Code
booster init claudeCursor
booster init cursorWindsurf
booster init windsurfOpenAI Codex
booster init codexEach command shows exactly what files will change and asks for confirmation before writing anything. Run booster remove <platform> to cleanly undo.
What's new
v0.2.16 – v0.2.25Daemon, verbosity modes, and output token tracking.
Three releases shipped together. The result: booster start is the only command you need, search is instant after the first run, and re-indexing costs nothing on unchanged files.
Output token tracking
booster-stop.py fires on every Claude Code session end and captures actual output tokens from the stop event. Stores baseline vs. actual in .booster/stats.db. booster gain now shows real savings — not estimates.
Verbosity modes
booster verbosity lite|full|ultra injects a conciseness block into CLAUDE.md, AGENTS.md, .cursorrules, and .windsurfrules. booster verbosity off removes it. Cuts output token count by 30–75% across all AI coding tools.
Memory compression
booster compress rewrites every file in memory/ through claude-haiku to strip filler and cut token count by ~60%. booster compress --dry-run previews savings without writing. Keeps project memory lean as it grows.
Background daemon
booster start launches a persistent Unix socket process that keeps the embedding model loaded. search_context drops from 2–3 s cold-start to ~50 ms. Daemon survives editor restarts — it's not tied to any terminal.
File watcher
watchdog monitors the project for writes. Changed files are re-indexed within 2 seconds of a save — no manual booster index during a coding session. Daemon handles this automatically.
Delta indexing
SHA-256 hash and mtime stored per file in the SQLite index. Full re-index skips unchanged files entirely. Large repos that took seconds now finish in milliseconds. Use --force to override.
Asymmetric embeddings
Index-time vectors use a passage: prefix; query-time vectors use query:. Follows the E5 paper's asymmetric retrieval approach. Retrieval accuracy improves meaningfully over symmetric embeddings, especially for short function names.
booster start does everything
One command bootstraps the full stack: detects installed AI tools (Claude Code, Cursor, Windsurf, Codex), wires each one that isn't already wired, indexes the project, builds embeddings, and starts the daemon. On subsequent runs it just wakes the daemon.
Full changelog
Every commit, diff, and release note lives in the GitHub repo. PRs welcome.
Under the hood
How it's actually built.
We're not optimizing models.
We're optimizing information flow.
A workflow that costs $2,500/day with brute-force context replay can often be reduced by several multiples — while maintaining comparable output quality. Every token you don't send is a token you don't pay for.
pip install agent-booster[full]Embeddings, file watcher, and daemon all included in the [full] extra.
