OpenRouter: Access to Premium Models in One Place

OpenRouter provides unified access to the best AI models from multiple providers through a single API.

Why OpenRouter?

OpenRouter aggregates models from:

Anthropic: Claude 4.5 Sonnet - exceptional reasoning and code quality
xAI: Grok 4.1 Fast - best agentic tool calling with 2M context
OpenAI: GPT-OSS 120B, o1/o3 series - reasoning and synthesis
Meta: Llama models - fast and cost-effective
Alibaba: Qwen Coder - specialized code generation (88% HumanEval)
Google: Gemini - multimodal capabilities

You use one backend at a time, but can configure different agents to use different models from OpenRouter’s catalog.

Configuration Guide

Step 1: Get Your OpenRouter Key

Sign up at openrouter.ai
Create an API key
Add it to Chuchu:
```
chu key openrouter sk-or-v1-...
```

Step 2: Configure Agent Models

Edit ~/.chuchu/setup.yaml with this killer configuration:

backend:
  openrouter:
    type: openai
    base_url: https://openrouter.ai/api/v1
    default_model: anthropic/claude-4.5-sonnet
    agent_models:
      router: meta-llama/llama-3.1-8b-instruct
      query: anthropic/claude-4.5-sonnet
      editor: anthropic/claude-4.5-sonnet
      research: x-ai/grok-4.1-fast
      review: anthropic/claude-4.5-sonnet

Why This Configuration Works

Router: `meta-llama/llama-3.1-8b-instruct`

Purpose: Fast intent classification
Why: Cheapest and fastest for simple routing decisions
Cost: ~$0.05/1M tokens input

Query: `anthropic/claude-4.5-sonnet`

Purpose: Reading and analyzing code
Why: Best comprehension and reasoning capabilities
Context: 200k tokens
Cost: ~$3/1M tokens input, $15/1M output

Editor: `anthropic/claude-4.5-sonnet`

Purpose: Writing and modifying code
Why: Superior code generation quality and reliability
Strength: Excellent at following instructions and maintaining code style

Research: `x-ai/grok-4.1-fast`

Purpose: Web search and tool use
Why: Designed specifically for agentic workflows with tool calling
Context: 2M tokens (massive context window!)
Special: Can enable/disable reasoning with reasoning_enabled parameter
Cost: Competitive pricing for agentic use cases

Review: `anthropic/claude-4.5-sonnet`

Purpose: Code review and analysis
Why: Catches subtle bugs and provides thoughtful feedback

Alternative Configurations

All-Free (Zero Cost!) 🎉

agent_models:
  router: google/gemini-2.0-flash-exp:free
  query: x-ai/grok-4.1-fast
  editor: kwaipilot/kat-coder-pro-v1:free
  research: x-ai/grok-4.1-fast
  review: qwen/qwen-3-coder-480b-a35b:free

All FREE models! Perfect for unlimited usage with zero cost:

Gemini 2.0 Flash: Fastest time-to-first-token for instant routing
Grok 4.1 Fast: 2M context window for massive codebases
KAT-Coder-Pro V1: 73.4% on SWE-Bench, specialized for agentic coding
Qwen3 Coder 480B: MoE architecture with deep code understanding

Budget-Conscious (Lower Cost)

agent_models:
  router: meta-llama/llama-3.1-8b-instruct
  query: openai/gpt-oss-120b
  editor: alibaba/qwen-2.5-coder-32b-instruct
  research: x-ai/grok-4.1-fast
  review: anthropic/claude-4.5-sonnet

Use GPT-OSS 120B for query (better reasoning than Llama 3.3 70B at lower cost), Qwen 2.5 Coder for editor (88.4% HumanEval), and premium models for research and review.

All-In Performance (Maximum Quality)

agent_models:
  router: meta-llama/llama-3.1-8b-instruct
  query: anthropic/claude-4.5-sonnet
  editor: anthropic/claude-4.5-sonnet
  research: x-ai/grok-4.1-fast
  review: openai/o1

Add OpenAI’s o1 for code review when you need the absolute best reasoning.

Grok-Heavy (Agentic Focused)

agent_models:
  router: meta-llama/llama-3.1-8b-instruct
  query: x-ai/grok-4.1-fast
  editor: anthropic/claude-4.5-sonnet
  research: x-ai/grok-4.1-fast
  review: x-ai/grok-4.1-fast

Maximize Grok’s 2M context and agentic capabilities for complex multi-step tasks.

Free Models Deep Dive

Grok 4.1 Fast (x-ai/grok-4.1-fast) - FREE!

Grok 4.1 Fast is particularly interesting for AI coding agents:

Agentic Design: Built from the ground up for tool calling and multi-step workflows
Real-World Use Cases: Excels at customer support, deep research, and complex debugging
2M Context Window: Can see your entire codebase at once

Reasoning Control: Enable reasoning for complex tasks, disable for speed:

"reasoning_enabled": true  # for complex multi-step debugging
"reasoning_enabled": false # for quick file searches

Cost: $0/$0 (currently free on OpenRouter)

Gemini 2.0 Flash Experimental (google/gemini-2.0-flash-exp:free) - FREE!

Fastest TTFT: Significantly faster time-to-first-token than Gemini 1.5
Quality: On par with larger models like Gemini Pro 1.5
Context: 1.05M tokens - huge for a free model
Strengths: Multimodal understanding, coding, complex instructions, function calling
Perfect For: Router agent - instant responses for intent classification
Cost: $0/$0 (experimental free tier)

KAT-Coder-Pro V1 (kwaipilot/kat-coder-pro-v1:free) - FREE!

SWE-Bench: 73.4% solve rate on SWE-Bench Verified benchmark
Agentic Coding: Designed specifically for software engineering tasks
Multi-Stage Training: Mid-training, SFT, RFT, and scalable agentic RL
Context: 256K tokens
Strengths: Tool use, multi-turn interaction, instruction following
Perfect For: Editor agent - generates high-quality production code
Cost: $0/$0 (free tier)

Qwen3 Coder 480B A35B (qwen/qwen-3-coder-480b-a35b:free) - FREE!

MoE Architecture: 480B total parameters, 35B active per forward pass
Experts: 8 out of 160 experts active per token
Context: 262K tokens
Strengths: Function calling, tool use, long-context reasoning over repositories
Perfect For: Code review - deep analysis with MoE reasoning
Cost: $0/$0 for requests under 128k tokens
Note: Pricing increases for >128k input tokens (still very cheap)

Cost Comparison (Approximate)

Model	Input ($/1M)	Output ($/1M)	Context	Best For
FREE MODELS
grok-4.1-fast	$0.00	$0.00	2M	Agentic workflows
gemini-2.0-flash-exp	$0.00	$0.00	1.05M	Fast routing
kat-coder-pro-v1	$0.00	$0.00	256K	Code generation
qwen3-coder-480b	$0.00	$0.00	262K	Code review
PAID MODELS
llama-3.1-8b	$0.05	$0.05	128K	Budget router
gpt-oss-120b	$0.15	$0.60	128K	Budget query/research
qwen-2.5-coder-32b	$0.14	$0.14	131K	Budget editor (88% HumanEval)
claude-4.5-sonnet	$3.00	$15.00	200K	Premium quality
o1/o3	$15.00+	$60.00+	200K	Deep reasoning

Prices are approximate and subject to change. Check openrouter.ai/models for current pricing.

Setup

Update your model catalog:
```
chu models update
```
Verify your configuration:
```
chu config show
```
Test with a chat:
```
chu chat
```

The Result

With OpenRouter, you get access to the best models from every major provider without managing multiple API keys and configurations. Your agents automatically use the right model for each task.

All-Free Configuration: $0/month

With the all-free setup, you get:

Gemini 2.0 Flash: Instant routing responses
Grok 4.1 Fast: 2M context for massive codebases (query & research)
KAT-Coder-Pro V1: 73.4% SWE-Bench performance for code generation
Qwen3 Coder 480B: MoE reasoning for thorough code reviews

Zero cost. Unlimited usage. Professional quality.

Premium Configuration: ~$5-10/month

For maximum quality:

Claude 4.5 for high-quality code generation and analysis
Grok 4.1 Fast for agentic research with massive context
Llama 3.1 for cost-effective routing

It’s the most flexible and powerful way to run Chuchu.

Tips

Start FREE: Try the all-free configuration first - it’s surprisingly powerful
Grok 4.1 Fast is free: Use it liberally for its 2M context window
Monitor usage: Check openrouter.ai/dashboard even for free tier
Free tier limitations: Some free models may have rate limits or change pricing
Upgrade selectively: If you need more quality, upgrade just the editor to Claude 4.5
Reasoning control: Enable Grok’s reasoning for complex debugging, disable for speed

Share your optimized OpenRouter configurations in GitHub Discussions!