Lesson 05 — Multi-Model & Failover: Get the Most Out of AI on a Budget

Goal: Configure multiple model providers, implement automatic failover, use affordable models for everyday tasks, and automatically upgrade for complex ones.

Why Multiple Models?

Scenario	Recommended Model
Everyday Q&A, translation	MiniMax M2.1 (affordable)
Complex reasoning, code architecture	Claude Opus or MiniMax M2.5 (pricier but more powerful)
Primary model down / rate-limited	Automatically switch to backup model (no interruption)

Scenario 1: MiniMax Primary + Claude Fallback

Edit ~/.openclaw/openclaw.json:

{
  "gateway": { "mode": "local" },
  "env": {
    "MINIMAX_API_KEY": "${MINIMAX_API_KEY}",
    "ANTHROPIC_API_KEY": "${ANTHROPIC_API_KEY}"
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "minimax/MiniMax-M2.1",
        "fallbacks": ["anthropic/claude-sonnet-4-6"]
      }
    }
  },
  "models": {
    "mode": "merge",
    "providers": {
      "minimax": {
        "baseUrl": "https://api.minimax.io/anthropic",
        "apiKey": "${MINIMAX_API_KEY}",
        "api": "anthropic-messages",
        "models": [
          {
            "id": "MiniMax-M2.1",
            "name": "MiniMax M2.1",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 15, "output": 60, "cacheRead": 2, "cacheWrite": 10 },
            "contextWindow": 200000,
            "maxTokens": 8192
          }
        ]
      }
    }
  }
}

When MiniMax returns an error or times out, it automatically switches to Claude Sonnet — completely transparent to the user.

Scenario 2: Use Different Models for Different Tasks

Switch models manually during a conversation:

# Switch to a reasoning model for complex tasks
pnpm openclaw models set minimax/MiniMax-M2.5
 
# Switch back to the cheaper one when done
pnpm openclaw models set minimax/MiniMax-M2.1

Or switch via slash command (if model-switching Skill is enabled):

/model M2.5
Help me design the architecture for a distributed caching system

Scenario 3: Claude Opus Primary + MiniMax Fallback (Economy Mode)

{
  "agents": {
    "defaults": {
      "models": {
        "anthropic/claude-opus-4-6": { "alias": "opus" },
        "minimax/MiniMax-M2.1": { "alias": "minimax" }
      },
      "model": {
        "primary": "anthropic/claude-opus-4-6",
        "fallbacks": ["minimax/MiniMax-M2.1"]
      }
    }
  }
}

Scenario 4: Full Multi-Model Config (Triple Redundancy)

{
  "env": {
    "MINIMAX_API_KEY": "${MINIMAX_API_KEY}",
    "ANTHROPIC_API_KEY": "${ANTHROPIC_API_KEY}",
    "OPENAI_API_KEY": "${OPENAI_API_KEY}"
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "minimax/MiniMax-M2.1",
        "fallbacks": [
          "anthropic/claude-sonnet-4-6",
          "openai/gpt-4o"
        ]
      }
    }
  }
}

If any one goes down, it automatically moves to the next — no interruptions, ever.

Check Which Model You're Currently Using

In a conversation, send:

/status

Returns something like:

Model: minimax/MiniMax-M2.1
Context: 4,821 / 200,000 tokens
Session: main

View Token Usage and Cost

/usage

Returns the token consumption and estimated cost for the current session, helping you keep costs under control.

Thinking Level

MiniMax M2.5 and Claude Opus support "deep thinking" mode, which consumes more tokens but produces more accurate answers:

/think high
Analyze the time complexity of this code and suggest optimizations:
[paste code]

Thinking levels: off / minimal / low / medium / high / xhigh

Use off for everyday Q&A; use high for complex tasks.

Model Selection Guide

Task Type	Recommended Model	Thinking Level
Everyday Q&A	MiniMax M2.1	off
Code generation	MiniMax M2.1 Lightning	low
Code review	MiniMax M2.5	medium
System design	MiniMax M2.5 / Claude Opus	high
Math reasoning	MiniMax M2.5	xhigh

FAQ

What triggers a failover?

A failover is automatically triggered when the primary model returns: HTTP 5xx errors, request timeouts, rate limiting (429), or model service unavailability. OpenClaw tries each entry in the fallbacks array in order, completely transparent to the user.

How do I know which model is actually being used?

Send /status in a conversation — the response will show the full ID of the currently active model (e.g., minimax/MiniMax-M2.1). If a failover occurred, it will be logged: tail -f /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep fallback.

How is token cost calculated for different models?

Cost is calculated according to each provider's official pricing, configured in the models[].cost field in openclaw.json (unit: cents per million tokens). Run /usage to see token consumption and estimated cost for the current session.

Can I manually force-switch models mid-conversation?

Yes. Use pnpm openclaw models set <model-ID> to switch instantly without restarting the gateway. Or configure a "model switch" Skill (SKILL.md) so AI understands natural language commands like "switch to M2.5."

Do I need to configure API Keys for all providers?

Only configure the providers you actually plan to use. If you're using MiniMax as primary with Claude as fallback, you only need those two keys. Unconfigured providers won't be called and won't cause errors.

Next Steps

Lesson 01 — Review basic configuration
Lesson 03 — Write a "model selection" Skill to let AI decide which model to use