Dependency Graph & Context Optimization
Chuchu builds a dependency graph of your codebase to intelligently select the most relevant files for each query.
Overview
The dependency graph feature:
- Parses imports across multiple languages (Go, Python, JS/TS, Ruby, Rust)
- Builds a graph where nodes are files and edges are dependencies
- Ranks files using PageRank algorithm (like Google Search)
- Optimizes context by selecting the most relevant files for your query
- Caches results for fast subsequent queries
Benefits:
- 5x token reduction (100k → 20k typical)
- Better LLM responses (focused context)
- Cost savings (fewer tokens = lower API costs)
- Faster responses (less to process)
How It Works
1. Import Detection
The graph builder scans your codebase and extracts import relationships:
Go:
import "github.com/user/repo/internal/auth" // External
import "mymodule/internal/config" // Internal (uses go.mod)
Python:
from auth import User # Relative
from project.db import connect # Absolute
JavaScript/TypeScript:
import { API } from './api' // Relative
import { auth } from '@/lib/auth' // Alias
Ruby:
require 'rails/all' # Gem
require_relative '../config' # Relative
Rust:
use crate::models::User; // Internal
use std::collections::HashMap; // Standard
2. Graph Construction
Files become nodes, imports become edges:
main.go → auth.go → user.go
→ api.go → user.go
→ config.go
3. PageRank Scoring
Files are scored by importance using PageRank algorithm:
- Files imported by many others = high score
- Files importing many others = lower score
- Scores sum to ~1.0 across all files
Example scores:
user.go: 0.187 (imported by auth, api, tests)
auth.go: 0.142 (imported by main, api)
config.go: 0.095 (imported by main)
main.go: 0.063 (entry point, imports many)
4. Context Optimization
When you ask a query like “how does authentication work?”:
- Keyword matching: Find files containing “auth”, “login”, “user”
- Neighbor expansion: Include files that import/are imported by matches
- PageRank weighting: Sort by importance score
- Top N selection: Select top 5 (configurable) most relevant files
5. Smart Truncation
Selected files are truncated to ~3000 chars each:
- Head (first 30 lines): Imports, package declaration, type definitions
- Tail (last 20 lines): Recent code, likely most relevant
This keeps essential context while reducing token usage.
Configuration
Max Files
Control how many files are added to context:
# View current setting (default: 5)
chu config get defaults.graph_max_files
# Increase for more context (1-20)
chu config set defaults.graph_max_files 10
# Decrease for fewer tokens
chu config set defaults.graph_max_files 3
Recommendations:
- Small projects (<50 files): 3-5 files
- Medium projects (50-500 files): 5-8 files
- Large projects (500+ files): 8-12 files
Environment Variables
Enable debug mode:
export CHUCHU_DEBUG=1
chu chat "your query" # Shows graph stats
Debug output example:
[Graph] Built: 143 nodes, 287 edges (from cache)
[Graph] Selected 5 files for query "authentication":
- internal/auth/handler.go (PR: 0.187)
- internal/auth/middleware.go (PR: 0.142)
- internal/models/user.go (PR: 0.095)
CLI Commands
Build Graph
Force rebuild ignoring cache:
chu graph build
Output:
🏗️ Building dependency graph...
Nodes: 143
Edges: 287
📊 Calculating PageRank...
✅ Done in 234ms
When to use:
- After major refactoring
- After adding/removing many files
- If cache seems stale
Query Graph
Find relevant files for a query:
chu graph query "authentication"
chu graph query "database connection"
chu graph query "api routes"
Output:
🔍 Query: "authentication"
📂 Relevant Context:
- internal/auth/handler.go (PR: 0.187)
- internal/auth/middleware.go (PR: 0.142)
- internal/models/user.go (PR: 0.095)
- cmd/server/main.go (PR: 0.063)
- internal/config/auth.go (PR: 0.051)
Auto-Integration in Chat Mode
The graph automatically enhances chu chat:
chu chat "how does authentication work?"
What happens:
- Query is analyzed for keywords (“authentication”, “auth”)
- Graph finds top 5 relevant files
- Files are truncated to ~3000 chars each
- Context is appended to your message: ``` [Context from Dependency Graph]
File: internal/auth/handler.go (lines 1-30, 180-200) [truncated content…]
File: internal/auth/middleware.go (lines 1-30, 95-115) [truncated content…]
5. LLM receives enhanced context for better answers
**Comparison:**
| Without Graph | With Graph |
|--------------|------------|
| All files (100k tokens) | Top 5 files (20k tokens) |
| $0.50/query | $0.10/query |
| Slower response | Faster response |
| Generic answers | Focused answers |
---
## Cache System
### How Caching Works
The graph is expensive to build (300ms for 500 files), so results are cached:
**Cache key:** MD5 of all file modification times
**Cache location:** `~/.chuchu/cache/graph_<md5>.json`
**Staleness:** 24 hours
### Cache Lifecycle
1. **First query:** Build graph, cache result
2. **Subsequent queries:** Load from cache (instant)
3. **File changes:** Detects via mtime hash, rebuilds
4. **24h expiry:** Rebuilds even if no changes detected
### Manual Cache Control
```bash
# Force rebuild (clears cache)
chu graph build
# Clear all caches manually
rm -rf ~/.chuchu/cache/graph_*.json
Supported Languages
| Language | Extensions | Import Detection |
|---|---|---|
| Go | .go |
import, uses go.mod for module resolution |
| Python | .py |
import, from...import |
| JavaScript | .js, .jsx |
import, require() |
| TypeScript | .ts, .tsx |
import, handles aliases |
| Ruby | .rb |
require, require_relative |
| Rust | .rs |
use, mod declarations |
Go Module Resolution
For Go projects, the builder reads go.mod to resolve internal imports:
// go.mod
module github.com/user/myproject
// main.go
import "github.com/user/myproject/internal/auth" // Resolved as internal
This ensures internal package imports are correctly linked in the graph.
Performance
Benchmarks
Medium Go project (150 files, 15k LOC):
- Build time: 234ms (first run)
- Cache load: 12ms (subsequent)
- PageRank: 18ms
- Context optimization: 3ms
Large TypeScript project (500 files, 80k LOC):
- Build time: 1.2s (first run)
- Cache load: 45ms (subsequent)
- PageRank: 67ms
- Context optimization: 8ms
Memory Usage
- Graph structure: ~500 bytes per file
- 500 files = ~250KB in memory
- Cache on disk: ~500KB for 500 files
Algorithm Details
PageRank Implementation
Classic PageRank with damping:
PR(A) = (1-d)/N + d * Σ(PR(Ti) / C(Ti))
Where:
- PR(A) = PageRank score of file A
- d = damping factor (0.85)
- N = total number of files
- Ti = files that link to A
- C(Ti) = number of outgoing links from Ti
Iteration:
- Runs for 20 iterations or until convergence
- Convergence threshold: 0.0001 delta
- Typical convergence: 8-12 iterations
Context Optimizer Algorithm
def optimize_context(query, max_files):
# 1. Keyword matching
candidates = []
for file in graph.nodes:
if query_keywords_in(file.path):
candidates.append(file)
# 2. Neighbor expansion
for candidate in candidates:
candidates += graph.neighbors(candidate)
# 3. Deduplicate
candidates = unique(candidates)
# 4. Sort by PageRank
candidates.sort(key=lambda f: f.score, reverse=True)
# 5. Select top N
return candidates[:max_files]
Keyword matching:
- Splits query into tokens
- Matches against file paths (case-insensitive)
- Supports partial matching (“auth” matches “authentication.go”)
Examples
Example 1: Authentication Query
chu chat "explain the authentication flow"
Graph selects:
internal/auth/handler.go- Contains “auth”, high PageRankinternal/auth/middleware.go- Neighbor of handlerinternal/models/user.go- Imported by auth filescmd/server/main.go- Entry point, imports authinternal/config/auth.go- Contains “auth” keyword
Result: LLM gets focused auth-related files, gives accurate answer
Example 2: Database Query
chu chat "how do we connect to the database?"
Graph selects:
internal/db/connection.go- Contains “db”, “connection”internal/db/migrations.go- Neighbor of connectioninternal/models/base.go- Imports db packageinternal/config/database.go- Contains “database”cmd/migrate/main.go- Uses db connection
Result: Comprehensive database context without unrelated files
Example 3: API Routes Query
chu chat "list all api endpoints"
Graph selects:
internal/api/routes.go- Contains “routes”, “api”internal/api/handlers.go- Neighbor of routesinternal/api/middleware.go- Imported by routescmd/server/main.go- Registers routesinternal/auth/api.go- Contains “api”
Result: Complete API overview from relevant route files
Troubleshooting
Graph not building
Issue: No graph data generated
Solutions:
# Check if project has supported files
ls **/*.go **/*.py **/*.ts
# Enable debug mode
export CHUCHU_DEBUG=1
chu graph build
# Check for errors in output
Incorrect file selection
Issue: Graph selects wrong files
Solutions:
# Test query matching
chu graph query "your search term"
# Adjust max_files if too few/many
chu config set defaults.graph_max_files 8
# Rebuild graph if stale
chu graph build
Cache not updating
Issue: Old files still in context
Solutions:
# Force rebuild
chu graph build
# Clear cache manually
rm ~/.chuchu/cache/graph_*.json
# Check file mtimes
ls -la <file>
Performance issues
Issue: Graph build is slow
Solutions:
- Exclude large directories in
.gitignore(already respected) - Reduce
graph_max_filesfor faster queries - Use cache (automatic after first build)
- Check for huge files (>100k LOC)
Limitations
Current Limitations
- Static analysis only
- No runtime dependency tracking
- Dynamic imports not detected
- Path-based matching
- Keyword search is simple substring matching
- No semantic understanding of code
- Truncation trade-off
- ~3000 chars per file may miss some context
- Configurable via code, not CLI (yet)
- Language support
- Only 6 languages supported
- No Java, C#, PHP, etc.
Future Enhancements
- Semantic code search (embeddings)
- Function-level granularity
- More languages (Java, C#, PHP)
- Configurable truncation
- Graph visualization
- Export to GraphML/DOT
Next Steps
- Explore ML Features for intent classification
- See Commands Reference for full CLI
- Read implementation in
internal/graph/ - Check tests in
internal/graph/builder_test.go