Reorganize docs/user-manual/ from flat structure to language subdirectories (zh/, en/, ja/) with shared assets/. Move existing Chinese docs into zh/, fix image paths, add multilingual navigation README, and translate all 23 markdown files (~4500 lines each) to English and Japanese.
6.4 KiB
4.3 Failover
Overview
The failover feature automatically switches to a backup provider when the primary provider's request fails, ensuring uninterrupted service.
Applicable scenarios:
- Unstable provider services
- High availability requirements
- Long-running tasks
Prerequisites
Using the failover feature requires:
- Proxy service started
- App takeover enabled
- Failover queue configured
- Auto failover enabled
Configure the Failover Queue
Open Configuration Page
Settings > Advanced > Failover
Select Application
Three tabs at the top of the page:
- Claude
- Codex
- Gemini
Select the application to configure.
Add Backup Providers
- In the "Failover Queue" area
- Click "Add Provider"
- Select a provider from the dropdown list
- The provider is added to the end of the queue
Adjust Priority
Drag providers to adjust their order:
- Lower numbers mean higher priority
- After the primary provider fails, backup providers are tried in order
Remove Provider
Click the "Remove" button to the right of the provider.
Main Interface Quick Actions
When both proxy and failover are enabled, provider cards display a failover toggle.
Add to Queue
- Find the provider card
- Enable the failover toggle
- The provider is automatically added to the queue
Remove from Queue
- Disable the failover toggle on the provider card
- The provider is removed from the queue
Enable Auto Failover
Steps
- On the failover configuration page
- Enable the "Auto Failover" toggle
Toggle Description
| State | Behavior |
|---|---|
| Off | Only records failures, no automatic switching |
| On | Automatically switches to the next provider on failure |
Failover Flow
graph TD
Start[Request arrives at proxy] --> Send[Send to current provider]
Send --> CheckSuccess{Success?}
CheckSuccess -- Yes --> Return[Return response]
CheckSuccess -- No --> LogFail[Record failure]
LogFail --> CheckCircuit{Check circuit breaker}
CheckCircuit -- Tripped --> Skip[Skip this provider]
CheckCircuit -- Not tripped --> IncFail[Increment failure count]
Skip --> Next{Next in queue?}
IncFail --> Next
Next -- Yes --> Switch[Switch provider]
Switch --> Retry[Retry request]
Retry --> Send
Next -- No --> Error[Return error]
Circuit Breaker Configuration
The circuit breaker prevents frequent retries against failing providers.
Configuration Items
Different apps have independent default configurations. Below are general defaults; Claude has its own relaxed configuration.
| Setting | Description | General Default | Claude Default | Range |
|---|---|---|---|---|
| Failure Threshold | Consecutive failures to trigger circuit breaker | 4 | 8 | 1-20 |
| Recovery Success Threshold | Successes needed in half-open state to close breaker | 2 | 3 | 1-10 |
| Recovery Wait Time | Time before attempting recovery after tripping (seconds) | 60 | 90 | 0-300 |
| Error Rate Threshold | Error rate that opens the circuit breaker | 60% | 70% | 0-100% |
| Minimum Requests | Minimum requests before calculating error rate | 10 | 15 | 5-100 |
Claude has more relaxed default settings due to longer request times, tolerating more failures.
Timeout Configuration
| Setting | Description | General Default | Claude Default | Range |
|---|---|---|---|---|
| Stream First Byte Timeout | Max wait time for first data chunk (seconds) | 60 | 90 | 1-120 |
| Stream Idle Timeout | Max interval between data chunks (seconds) | 120 | 180 | 60-600 (0 to disable) |
| Non-stream Timeout | Total timeout for non-streaming requests (seconds) | 600 | 600 | 60-1200 |
Retry Configuration
| Setting | Description | General Default | Claude Default | Range |
|---|---|---|---|---|
| Max Retries | Number of retries on request failure | 3 | 6 | 0-10 |
Gemini's default max retries is 5.
Circuit Breaker States
| State | Description |
|---|---|
| Closed | Normal state, requests allowed |
| Open | Circuit broken, this provider is skipped |
| Half-Open | Attempting recovery, sending probe requests |
State Transitions
stateDiagram-v2
[*] --> Closed: Initialize
Closed --> Open: Failures >= threshold
Open --> HalfOpen: Recovery wait time expires
HalfOpen --> Closed: Probe successes >= recovery threshold
HalfOpen --> Open: Probe failed
Health Status Indicators
Provider Cards
Cards display health status badges:
| Badge | Status | Description |
|---|---|---|
| Green | Healthy | 0 consecutive failures |
| Yellow | Warning | Has failures but circuit not tripped |
| Red | Circuit Broken | Circuit breaker tripped, temporarily skipped |
Queue List
The failover queue also displays each provider's health status.
Failover Logs
Each failover event records:
| Information | Description |
|---|---|
| Time | When it occurred |
| Original Provider | The provider that failed |
| New Provider | The provider switched to |
| Failure Reason | Error message |
Viewable in the request logs within usage statistics.
Best Practices
Queue Configuration Recommendations
- Primary provider: The most stable and fastest provider
- First backup: Second-best choice
- Second backup: Last resort
Circuit Breaker Configuration Recommendations
| Scenario | Failure Threshold | Recovery Wait |
|---|---|---|
| High availability requirement | 2 | 30 seconds |
| General scenario | 3 | 60 seconds |
| Tolerant of occasional failures | 5 | 120 seconds |
Monitoring Recommendations
Periodically check:
- Health status of each provider
- Failover frequency
- Circuit breaker trigger frequency
FAQ
Failover Not Triggering
Check:
- Is the proxy service running
- Is app takeover enabled
- Is auto failover enabled
- Are there backup providers in the queue
Failover Triggering Too Frequently
Possible causes:
- Unstable primary provider
- Network issues
- Configuration errors
Solutions:
- Check primary provider status
- Adjust circuit breaker parameters
- Consider changing the primary provider
All Providers Circuit-Broken
Wait for the recovery wait time to expire for automatic recovery, or:
- Manually restart the proxy service
- Reset circuit breaker states