Context Engineering: Making AI Work in Real Codebases
Context Engineering: Making AI Work in Real Codebases
In the previous post, we talked about why Chuchu exists—making AI coding assistance affordable. Now let’s talk about how to actually make it work in production codebases.
The Real Problem
The Stanford study on AI’s impact on developer productivity found something concerning:
- A lot of “extra code” shipped by AI tools ends up getting reworked the next week
- AI works well for greenfield projects but struggles with large established codebases
Sound familiar? The common responses are:
- “Too much slop”
- “Doesn’t work in big repos”
- “Maybe someday when models are smarter…”
But here’s the thing: You can get really far with today’s models if you embrace core context engineering principles.
What’s Actually Possible
Recent experiments show that proper context management enables AI to handle:
- 300k+ LOC codebases
- Complex system changes (cancellation support, WASM compilation)
- Week-long features shipped in a day
- Code that passes expert review
This isn’t about smarter models. It’s about context engineering—the art of managing what information the LLM sees and when.
Understanding Context Windows
LLMs are stateless functions. The only thing affecting output quality (without training new models) is input quality.
At any given turn, a coding agent is:
Context Window In → Next Step Out
That’s it. The contents of your context window are the only lever you have.
What Eats Context?
- Searching for files
- Understanding code flow
- Applying edits
- Test/build logs
- Large JSON responses from tools
All of these flood the context window with noise.
Optimize For
- Correctness: No wrong information
- Completeness: All relevant information
- Compactness: Minimal noise
Or as one equation:
Output Quality ∝ (Correctness × Completeness) / Noise
The Golden Rule
You only have ~170k tokens of context. Use as little as possible. The more you use, the worse the outcomes.
The Naive Approach (Don’t Do This)
Most people use AI coding tools like a chatbot:
- Chat back and forth
- Vibe your way through
- Hit context limit or give up
- Start over with “try again but use XYZ approach”
This fills context with noise and gets you stuck in loops.
Better: Intentional Compaction
Compaction means distilling context into structured artifacts.
When context fills up, pause and ask:
“Write everything we did so far to progress.md. Note:
- The end goal
- The approach we’re taking
- Steps completed
- Current state/blockers”
Start a fresh session with this compact summary.
What Good Compaction Looks Like
## Goal
Add user authentication with JWT tokens
## Approach
1. Create User model with bcrypt password hashing
2. Add JWT generation/validation middleware
3. Protect routes with auth middleware
## Progress
- [x] User model created with tests
- [x] Password hashing working
- [ ] Currently: JWT middleware failing validation
## Current Issue
Token signature verification fails with RS256.
Need to check if we're using correct public key format.
This is 10 lines vs 1000+ lines of chat history.
Even Better: Frequent Intentional Compaction
Design your entire workflow around context management.
Keep utilization in the 40-60% range. Split work into phases:
1. Research
Understand the codebase and problem:
- Which files are relevant?
- How does information flow?
- What are potential solutions?
Output: Compact research document with key findings.
2. Plan
Create precise implementation steps:
- Exact files to edit
- Specific changes per file
- Testing/verification at each phase
Output: Step-by-step plan with acceptance criteria.
3. Implement
Execute the plan phase by phase:
- One phase at a time
- Verify before moving on
- Compact progress back into plan
Output: Working, tested code.
Why This Works in Chuchu
Chuchu’s multi-agent architecture is designed around this principle:
Router Agent (8B model)
- Fast intent classification (~840 TPS)
- Minimal context needed for routing
- Routes to appropriate specialized agent
Query Agent (reasoning model)
- Research and codebase analysis1
- Reads files, searches patterns
- Compacts findings into structured output
- Fresh context for each analysis
Editor Agent (code-specialized model)
- Receives focused context from query
- Implements changes incrementally
- Can use larger context models when needed
Research Agent (with web tools)
- External documentation lookup
- API reference search
- Summarizes findings separately from main work
- Keeps noise out of implementation context
Key insight: Each agent starts with a clean, focused context containing only what it needs for its specific task. No agent sees the full chat history—only relevant information.
Human Leverage: Where to Focus
A bad line of code = 1 bad line A bad line in a plan = 100s of bad lines A bad line in research = 1000s of bad lines
Focus human review on high-leverage artifacts:
- Review research documents (highest leverage)
- Review implementation plans (medium leverage)
- Review code (lowest leverage, but still important)
With this approach:
- You can’t read 2000 lines of code daily
- But you can read 200 lines of a plan
- And you can steer research to focus on what matters
Mental Alignment
The biggest problem with AI-generated code isn’t correctness—it’s losing touch with your codebase.
When AI ships 2000-line PRs daily, you start losing mental alignment with:
- What your product does
- How systems work
- Why decisions were made
Research/Plan/Implement artifacts solve this:
- Plans keep everyone aligned on changes
- Research documents explain the “why”
- You can quickly learn unfamiliar parts of the codebase
Practical Tips for Chuchu
Start With Focused Commands
# Research phase - understand the codebase
chu research "how does user auth work in this codebase"
# Read the output, steer if needed
# Plan phase - create structured plan
chu plan "add password reset via email"
# Review the plan before implementing
# Implement phase - execute the plan
chu implement ~/.chuchu/plans/2024-11-15-password-reset.md
# Note: Implementation reads the plan and executes phase by phase
Each command starts with fresh context, avoiding the context pollution of long chat sessions.
Use Different Models for Different Tasks
Chuchu lets you assign specialized models to each agent role:
backend:
groq:
agent_models:
router: llama-3.1-8b-instant # Speed: 840 TPS
query: llama-3.3-70b-versatile # Reasoning: 70B params
editor: llama-3.3-70b-versatile # Coding: versatile
research: groq/compound # Tools: web search
Why this works:
- Router needs speed, not depth → use small/fast model
- Query needs comprehension → use reasoning model
- Editor needs code quality → use specialized coding model
- Research needs tools → use model with web search
Each agent gets the right tool for its job, not one-size-fits-all.
Keep Context Tight
If you notice responses getting worse or repetitive:
- Save your progress: Write summary to a file
- Exit current session: Start fresh
- Resume with context: Load the compact summary
Chuchu’s command-based workflow naturally encourages this:
chu research→ outputs findingschu plan→ reads findings, outputs planchu implement→ reads plan, outputs code
Each step is independently verifiable and resumable.
Incremental Verification
Don’t try to do everything in one go:
# Step 1: Understand what needs to change
chu research "payment processing flow"
# Step 2: Create detailed plan
chu plan "add Stripe webhook handling"
# Review plan - does it make sense?
# Step 3: Implement incrementally
chu implement plan.md
# Review changes - does code match plan?
This workflow gives you multiple checkpoints to catch issues early, when they’re cheap to fix.
This Is Not Magic
You still need to:
- Engage deeply with the task
- Review research and plans
- Steer when things go wrong
- Understand the changes
There’s no magic prompt that solves everything. But proper context engineering makes AI actually useful for hard problems.
What Works
With this approach, Chuchu can:
- Work in brownfield codebases (not just toys)
- Solve complex problems (not just CRUD)
- Produce quality code (not slop)
- Maintain mental alignment (not black box)
And do it affordably:
- Groq: $2-5/month typical usage
- Ollama: $0/month (fully local)
What’s Next
In future posts we’ll cover:
- Optimal model configurations for different project sizes
- Setting up local Ollama for zero-cost coding
- Advanced prompting techniques for TDD
But the foundation is always the same: manage your context window like your productivity depends on it—because it does.
References
Have questions about context engineering? Join the discussion in GitHub Discussions
-
Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS 2020. https://arxiv.org/abs/2005.11401 ↩