Intelligent Efficiency: chu do Finds The Optimal Path
Intelligent Efficiency: chu do Finds The Optimal Path
Today we’re releasing chu do—an autonomous execution system that doesn’t just recover from failures, it actively optimizes for efficiency. The system evaluates cost, speed, reliability, and availability to find the best route to complete your task.
The Problem
You want to complete a task. The system could use:
- An expensive premium model that works
- A fast but unreliable model
- A free model that might be slow
- A local model that’s private but limited
Traditional approach: Pick one and hope it works.
What if the system could evaluate all options and choose the most efficient path automatically?
Enter: Intelligent Efficiency
$ chu do "create a hello.txt file with Hello World" --verbose
Backend: groq
Editor Model: moonshotai/kimi-k2-instruct-0905
❌ Attempt 1 failed: tool 'read_file' not available
🤔 Evaluating all available options...
💡 Intelligence recommends: openrouter/moonshotai/kimi-k2:free
Overall Score: 0.88
Success Rate: 100% | Speed: 300 TPS | Cost: $0.000/1M | Latency: 20191ms
Reason: Success: 100% (4 tasks), Speed: 300 TPS, Cost: $0.00/1M
🔄 Switching to optimal model...
=== Attempt 2/3 ===
✓ Task completed successfully
No user intervention. No config editing. The system:
- Detected the failure
- Evaluated all available options across backends
- Calculated efficiency scores considering: success rate, speed, cost, latency
- Chose the optimal model (free, fast, 100% success)
- Switched automatically
- Succeeded
How It Works
1. Execution History
Every task execution is recorded to ~/.chuchu/task_execution_history.jsonl:
{
"timestamp": "2025-11-24T14:30:38Z",
"task": "create a hello.txt file",
"backend": "groq",
"model": "moonshotai/kimi-k2-instruct-0905",
"success": false,
"error": "tool 'read_file' not available"
}
This isn’t just logging—it’s a training dataset.
2. Real-Time Learning
The intelligence system calculates success rates per model/backend:
groq/moonshotai/kimi-k2-instruct-0905: 0% (3 failures)
openrouter/moonshotai/kimi-k2:free: 100% (4 successes)
3. Confidence-Based Recommendations
Initial recommendation (no history):
💡 Intelligence recommends: openrouter/moonshotai/kimi-k2:free
Confidence: 50%
Reason: Known to support function calling
After learning (≥3 tasks):
💡 Intelligence recommends: openrouter/moonshotai/kimi-k2:free
Confidence: 100%
Reason: Historical success rate: 100% (3 tasks)
The system improves recommendations over time based on your actual usage patterns.
Multi-Criteria Optimization
The system doesn’t just pick “any working model”—it finds the most efficient one.
Scoring Formula
Score = 0.5 * SuccessRate + 0.2 * Speed + 0.2 * Cost + 0.1 * Availability
Weights explained:
- 50% Success Rate: Reliability is most important
- 20% Speed: Fast models = better UX
- 20% Cost: Free models preferred when viable
- 10% Availability: Rate limits matter
Example Calculation
openrouter/kimi:free:
- Success: 100% = 0.50
- Speed: 300 TPS = 0.06 (300/1000 * 0.2)
- Cost: $0/1M = 0.20 (free = max score)
- Availability: 100% = 0.10
- Total: 0.86
groq/llama-70b:
- Success: 0% = 0.00
- Speed: 500 TPS = 0.10
- Cost: $0/1M = 0.20
- Availability: 100% = 0.10
- Total: 0.40
→ System chooses openrouter/kimi:free (higher score)
Not Just Fallback Logic
This isn’t a hardcoded list of “if model X fails, try model Y.”
Traditional fallback:
if err := tryModel("groq/model-a"); err != nil {
return tryModel("openrouter/model-b") // Hardcoded
}
Intelligence-based:
if err := tryModel(currentModel); err != nil {
// Query ML system
history := getExecutionHistory(limit=100)
recommendations := calculateSuccessRates(history, taskType)
bestModel := recommendations.sortByConfidence()[0]
return tryModel(bestModel) // Data-driven
}
Key differences:
- Adapts to your setup (not universal defaults)
- Learns from failures (improves over time)
- Cross-backend switching (not limited to one provider)
- Confidence scores (transparency in decision-making)
Backend Switching
The system can automatically switch backends during retry:
Attempt 1: groq/model-x → Failed
Attempt 2: openrouter/model-y → Success ✓
This requires both backends to be configured, but the intelligence system will discover which combination works best for your specific use case.
Real-World Example
Let’s trace a real execution:
First Time (Cold Start)
$ chu do "create config.yaml" --verbose
# Attempt 1 with default model
Backend: groq
Model: moonshotai/kimi-k2-instruct-0905
❌ Failed: tool not available
# Intelligence recommendation (no history yet)
💡 Recommends: openrouter/moonshotai/kimi-k2:free
Confidence: 50%
Reason: Known to support function calling
# Retry succeeds
✓ Task completed
System learned: openrouter/kimi:free works for this task type.
Second Time
$ chu do "create database.yaml" --verbose
# Still tries default first (respects user config)
❌ Failed: tool not available
# Now has 1 success in history
💡 Recommends: openrouter/moonshotai/kimi-k2:free
Confidence: 50% # Still < 3 tasks
✓ Task completed
System learned: Second success with openrouter/kimi:free.
Third Time
$ chu do "create api.yaml" --verbose
❌ Failed: tool not available
# Now has 2 successes in history
💡 Recommends: openrouter/moonshotai/kimi-k2:free
Confidence: 50% # Still < 3 tasks
✓ Task completed
Fourth Time (Confidence Kicks In)
$ chu do "create server.yaml" --verbose
❌ Failed: tool not available
# Now has ≥3 tasks: uses historical success rate
💡 Recommends: openrouter/moonshotai/kimi-k2:free
Confidence: 100% # 3/3 successes!
Reason: Historical success rate: 100% (3 tasks)
✓ Task completed
System is now confident: openrouter/kimi:free is the right choice for this user’s setup.
Why This Matters
1. Zero Configuration After Setup
Once you’ve configured multiple backends, the system figures out the optimal combination for you.
2. Adapts to Your Environment
Different users have different:
- API quotas
- Model availability
- Network conditions
- Cost constraints
The intelligence system learns your specific patterns, not universal defaults.
3. Improves With Usage
The more you use chu do, the smarter it gets. No manual tuning required.
4. Transparent Decision-Making
Every recommendation comes with:
- Confidence score
- Reasoning
- Historical data
You always know why the system chose a particular model.
Command Usage
Basic
chu do "create a file"
With Verbose (Recommended Initially)
chu do "create a file" --verbose
Shows:
- Which models are being tried
- Why alternatives are recommended
- Confidence scores
- Success/failure details
Dry Run (Analysis Only)
chu do "complex refactoring" --dry-run
Analyzes the task without executing.
Control Retries
chu do "task" --max-attempts 5
Default is 3 attempts.
Viewing Your History
# Raw history
cat ~/.chuchu/task_execution_history.jsonl
# Analyze success rates
cat ~/.chuchu/task_execution_history.jsonl | \
jq -s 'group_by(.backend + "/" + .model) |
map({
model: (.[0].backend + "/" + .[0].model),
success_rate: (map(select(.success)) | length) / length,
total: length
})'
Example output:
[
{
"model": "groq/moonshotai/kimi-k2-instruct-0905",
"success_rate": 0,
"total": 4
},
{
"model": "openrouter/moonshotai/kimi-k2:free",
"success_rate": 1,
"total": 4
}
]
Clear pattern: Switch to OpenRouter for this user’s setup.
Technical Implementation
Intelligence Package
New internal/intelligence/ package with:
history.go
RecordExecution()- Persist task resultsGetRecentModelPerformance()- Calculate success rates- JSONL format for easy analysis
recommender.go
RecommendModelForRetry()- ML-based model selection- Considers: history, capabilities, backend availability
- Returns sorted recommendations with confidence scores
Auto-Recovery Flow
func runDoExecutionWithRetry(task string, maxAttempts int) error {
for attempt := 1; attempt <= maxAttempts; attempt++ {
err := runDoExecution(task, currentModel)
// Record result
intelligence.RecordExecution(TaskExecution{
Task: task,
Model: currentModel,
Success: err == nil,
Error: err.Error(),
})
if err == nil {
return nil // Success!
}
if !isToolError(err) {
return err // Different type of error
}
// Get recommendation
recs, _ := intelligence.RecommendModelForRetry(
setup, "editor", currentBackend, currentModel, task
)
// Retry with recommended model
currentModel = recs[0].Model
currentBackend = recs[0].Backend
}
return fmt.Errorf("failed after %d attempts", maxAttempts)
}
Guided Mode Extension
Added NewGuidedModeWithCustomModel() to allow model override during retry:
type GuidedMode struct {
model string // Query model
editorModel string // Can be different during retry
}
This enables switching the editor model while keeping the same orchestrator/provider.
Future Enhancements
Current version uses simple success rate calculation. Planned improvements:
1. Task Feature Extraction
- Complexity estimation
- File count
- Language detection
- Operation type (read vs write)
2. Cost Optimization
- Factor in model pricing
- Prefer cheaper models when confidence is similar
3. Latency Awareness
- Track execution time
- Prefer faster models for simple tasks
4. Advanced ML Models
See Intelligence Layers notebook for the full ML roadmap.
References
Comparison: chu do vs chu guided
| Feature | chu do | chu guided |
|---|---|---|
| User approval | None | Required |
| Auto-recovery | ✓ With learning | ✗ Manual fix |
| Learning | ✓ Improves over time | ✗ Static |
| Speed | Fast (automatic retry) | Slower (human review) |
| Safety | Medium | High |
| Best for | Quick tasks, iteration | High-risk changes |
Getting Started
1. Update Chuchu
cd ~/chuchu
git pull origin main
go build -o bin/chu cmd/chu/*.go
2. Configure Multiple Backends
chu setup
# Add at least 2 backends (e.g., groq + openrouter)
3. Try It Out
chu do "create a test.txt file with Hello" --verbose
4. Watch It Learn
Run a few more tasks and observe confidence scores increasing.
5. Check Your Stats
cat ~/.chuchu/task_execution_history.jsonl | jq
Best Practices
Let It Learn
Don’t intervene manually during retries. The system needs real failure/success data to learn.
Use Verbose Mode Initially
chu do "task" --verbose
Helps you understand:
- Which models work in your setup
- Why certain recommendations are made
- How confidence builds over time
Configure Diverse Backends
More backends = more alternatives:
- Groq: Fast, cheap (some models lack tools)
- OpenRouter: Many free options with tools
- Ollama: Local, private
- OpenAI: Premium, reliable
Don’t Reset History
task_execution_history.jsonl is your trained model. Preserve it across reinstalls.
Known Limitations
Cold Start Problem
First few executions have lower confidence (50%). After ≥3 tasks, confidence becomes data-driven.
Requires Multiple Backends
If you only have one backend configured, the system can’t switch. Configure at least 2.
Tool-Error Specific
Currently only triggers on function calling errors. Other failure modes may not auto-recover.
Community Feedback
We’d love to hear:
- How well does the system learn in your setup?
- Which model combinations work best?
- What confidence threshold feels right for auto-retry?
Open an issue or discussion on GitHub.
Posted on November 26, 2025. Tested on commit b462c9f with real execution data.
-
Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T. Y., & Tegmark, M. (2024). KAN: Kolmogorov-Arnold Networks. arXiv:2404.19756 [cs.LG]. https://arxiv.org/abs/2404.19756 ↩
-
Liu, Z., Ma, P., Wang, Y., Matusik, W., & Tegmark, M. (2024). KAN 2.0: Kolmogorov-Arnold Networks Meet Science. arXiv:2408.10205 [cs.LG]. https://arxiv.org/abs/2408.10205 ↩