Optimal Groq Configurations for Different Budgets
Optimal Groq Configurations for Different Budgets
Groq offers blazing-fast inference speeds with their LPU technology. Here are optimized agent model configurations for different use cases and budgets.
Understanding Agent Roles
Chuchu uses specialized agents for different tasks:
- Router: Fast intent classification (needs speed, not depth)
- Query: Reading and analyzing code (needs comprehension)
- Editor: Writing and modifying code (needs code generation quality)
- Research: Web search and documentation lookup (benefits from tool use)
Budget-Conscious Configuration ($0.05 - $0.42 per 1M tokens)
Best balance of cost and performance for most developers:
backend:
groq:
base_url: https://api.groq.com/openai/v1
default_model: llama-3.1-8b-instant
agent_models:
router: llama-3.1-8b-instant
query: gpt-oss-20b-128k
editor: deepseek-r1-distill-qwen-32b
research: groq/compound-mini
Monthly estimate (100M tokens): ~$35
Why this works:
- Router: Fastest/cheapest at 840 TPS for intent classification
- Query: GPT-OSS 20B efficient model with solid comprehension
- Editor: DeepSeek-R1-Distill-Qwen-32B excellent for code (83.3% AIME, 94.3% MATH-500)
- Research: Compound Mini with web search at budget price
Performance-Focused Configuration ($0.11 - $3.00 per 1M tokens)
For projects where code quality is critical and budget is flexible:
backend:
groq:
base_url: https://api.groq.com/openai/v1
default_model: gpt-oss-120b-128k
agent_models:
router: llama-3.1-8b-instant
query: gpt-oss-120b-128k
editor: moonshotai/kimi-k2-instruct-0905
research: groq/compound
Monthly estimate (100M tokens): ~$90
Why this works:
- GPT-OSS 120B excels at code comprehension and reasoning (120B > 70B)
- Nearly matches or exceeds Llama 3.3 70B on benchmarks at 75% lower cost
- Kimi K2 has 1 trillion parameters and 256k context window for complex edits
- Full Compound system with GPT-OSS-120B for research
- Still uses fast router for cost efficiency
Research-Heavy Configuration
Optimized for projects with extensive documentation and web research needs:
backend:
groq:
base_url: https://api.groq.com/openai/v1
default_model: gpt-oss-20b-128k
agent_models:
router: llama-3.1-8b-instant
query: gpt-oss-20b-128k
editor: gpt-oss-120b-128k
research: groq/compound
Monthly estimate (100M tokens): ~$37
Why this works:
- GPT-OSS models excel at information synthesis
- Full Compound system with web search and browser automation
- Good balance of comprehension and generation
Speed-Optimized Configuration
When latency matters more than token cost:
backend:
groq:
base_url: https://api.groq.com/openai/v1
default_model: llama-3.1-8b-instant
agent_models:
router: llama-3.1-8b-instant
query: llama-3.1-8b-instant
editor: qwen/qwen3-32b
research: llama-4-scout-17bx16e-128k
Monthly estimate (100M tokens): ~$20
Why this works:
- Prioritizes speed: 840 TPS for router and query
- Qwen3 32B: Efficient coding model with strong performance
- Latency-optimized: All models selected for maximum throughput
- Budget-friendly: Lowest cost configuration
Model Specifications Reference
| Model | Input | Output | Context | Speed (TPS) | Best For |
|---|---|---|---|---|---|
| llama-3.1-8b-instant | $0.05 | $0.08 | 128k | 840 | Router, fast tasks |
| gpt-oss-20b-128k | $0.075 | $0.30 | 128k | 1000 | Query, analysis |
| gpt-oss-120b-128k | $0.15 | $0.60 | 128k | 500 | Query, research, synthesis |
| deepseek-r1-distill-qwen-32b | $0.14 | $0.42 | 128k | 600 | Editor, coding tasks |
| qwen/qwen3-32b | $0.18 | $0.18 | 131k | 650 | Editor, fast coding |
| kimi-k2-instruct-0905 | $1.00 | $3.00 | 256k | 200 | Large context, complex edits |
| groq/compound | $0.15 | $0.60 | 131k | 450 | Research with tools |
| groq/compound-mini | $0.11 | $0.34 | 131k | 500 | Budget research with tools |
Prices per 1M tokens. TPS = tokens per second throughput.
Groq Compound Systems
Compound models are special - they combine multiple models with tool capabilities:
groq/compound
- Models: GPT-OSS-120B + Llama 4 Scout
- Tools: Web search, code execution, browser automation, Wolfram Alpha
- Pricing: Base model pricing + tool costs
- Basic web search: $5/1000 requests
- Advanced web search: $8/1000 requests
- Visit website: $1/1000 requests
- Code execution: $0.18/hour
- Browser automation: $0.08/hour
groq/compound-mini
- Models: Llama 4 Scout only
- Tools: Same as compound
- Pricing: Lower base model cost + tool costs
Setting Up
- Update your model catalog:
chu models update - Switch to Groq backend and configure agent models in Neovim:
Ctrl+X (in chat buffer) - Or edit
~/.chuchu/setup.yamldirectly with your chosen configuration
Tips
- Start with budget-conscious config and upgrade specific agents as needed
- Use
groq/compound-minifor research if you don’t need GPT-OSS-120B - Router agent is called most frequently - keep it fast and cheap
- Editor agent output quality matters most - invest there first
- Monitor your usage at console.groq.com
Have your own optimized configuration? Share it on GitHub Discussions!