cc-switch

mirror of https://github.com/farion1231/cc-switch.git synced 2026-05-23 14:12:14 +08:00

Author	SHA1	Message	Date
lif	444c123ad0	[codex] Stabilize Codex OAuth cache routing (#2218 ) * Stabilize Codex OAuth cache routing Codex OAuth-backed Claude proxy requests now reuse a client-provided session identity for prompt cache routing and send Codex-like session headers when that identity exists. Generated proxy UUIDs are intentionally excluded so they do not fragment cache locality.\n\nThe same path exposed two runtime issues during validation: rustls needed an explicit process crypto provider, and Codex OAuth can return Responses SSE even when the original Claude request is non-streaming. Those are handled so cache-routed requests can complete instead of panicking or being parsed as JSON.\n\nConstraint: Official Codex uses conversation identity and Responses session headers for prompt cache routing.\nRejected: Always use generated proxy session IDs \| generated IDs change per request and reduce cache reuse.\nConfidence: medium\nScope-risk: moderate\nDirective: Do not remove the client-provided-session guard unless generated session IDs become stable per conversation.\nTested: cargo test codex_oauth\nTested: Local dev app health check on 127.0.0.1:15721\nTested: Local proxy logs showed cache_read_tokens after restart\nNot-tested: Full cargo test without local cc-switch port conflict\nRelated: #2217 * feat(proxy): aggregate forced Codex OAuth SSE into JSON for non-streaming clients Narrow override on top of #2235's streaming fallback. Codex OAuth always forces upstream openai_responses into SSE, even when the original Claude request is stream:false. #2235 handles this by routing such responses through the streaming transform so the client receives text/event-stream — that avoids the 422 that JSON parsing would produce, and it also protects any other provider that unexpectedly returns SSE (the response.is_sse() guard). But for Claude SDK callers that sent stream:false, returning SSE still violates the Anthropic non-streaming contract. This commit adds an override on exactly one combination — non-streaming client + codex_oauth + openai_responses — to aggregate the upstream Responses SSE into a synthetic Responses JSON and then run the regular responses_to_anthropic non-streaming transform. All other paths, including the generic response.is_sse() fallback, remain on the streaming path from #2235. The aggregator reuses proxy::sse::take_sse_block / strip_sse_field, which support both \n\n and \r\n\r\n delimiters; a hand-rolled split("\n\n") would silently fail on real HTTPS upstreams. Tests cover the happy path, CRLF delimiters, response.failed errors, and the missing response.completed defensive branch. --------- Co-authored-by: Jason <farion1231@gmail.com>	2026-04-23 11:15:01 +08:00
Dex Miller	03a0f9661b	feat(proxy): Gemini Native API proxy integration (#1918 ) * refactor(proxy): extract take_sse_block helper with CRLF delimiter support Replace inline `buffer.find("\n\n")` SSE splitting logic across streaming, streaming_responses, response_handler, and response_processor with a shared `take_sse_block` function that handles both `\n\n` and `\r\n\r\n` delimiters. * feat(proxy): add Gemini Native URL builder and full-URL resolver Introduce gemini_url module that normalizes legacy Gemini/OpenAI-compatible base URLs into canonical models/:generateContent endpoints. Supports both structured Gemini URLs (auto-normalized) and opaque relay URLs (pass-through with query params only). feat(proxy): add Gemini Native schema, shadow store, transform, and streaming - gemini_schema: Gemini generateContent request/response type definitions - gemini_shadow: session-scoped shadow store for thinking signature and tool-call state replay across streaming chunks - transform_gemini: bidirectional Anthropic Messages ↔ Gemini Native request/response conversion with thinking block and tool-use support - streaming_gemini: Gemini SSE → Anthropic SSE streaming adapter with incremental thinking/text/tool_use delta emission * feat(proxy): wire Gemini Native format into proxy core and Claude adapter Integrate gemini_native api_format throughout the proxy pipeline: - ClaudeAdapter: detect Gemini provider type, Google/GoogleOAuth auth strategies, and suppress Anthropic-specific headers for Gemini targets - Forwarder: Gemini URL resolution, shadow store threading, endpoint rewriting to models/:generateContent with stream/non-stream variants - Handlers: route Gemini streaming through streaming_gemini adapter and non-streaming through transform_gemini converter - Server/State: add GeminiShadowStore to shared ProxyState - StreamCheck: support gemini_native health check with proper auth headers feat(ui): add Gemini Native provider preset and api format option - Add gemini_native to ClaudeApiFormat type and ProviderMeta.apiFormat - Add "Gemini Native" provider preset with default Google AI endpoints - Show Gemini-specific endpoint hints and full-URL mode guidance - Add gemini_native option to API format selector in ClaudeFormFields - Add i18n strings for zh/en/ja * feat(proxy): add Gemini Native tool argument rectification * feat(proxy): update Gemini streaming and transformation logic * fix(proxy): align shadow turns to tail on client history truncation * fix: revert unrelated cache_key change in claude proxy transform Restore .unwrap_or(&provider.id) fallback for cache_key to match main branch behavior. Only gemini_native related changes should be in this branch. * Prevent Gemini review regressions in streaming and tool rectification PR #1918 review feedback exposed two correctness issues in the Gemini Native adapter path. Gemini SSE buffering was still using lossy UTF-8 decoding, which could corrupt split multibyte payloads and drop streamed output. Tool arg rectification also removed top-level parameters eagerly, which broke tools that legitimately define a parameters field. This change moves Gemini SSE buffering onto the existing append_utf8_safe path and makes parameters flattening conditional on the schema actually expecting nested extraction. The old Skill rectification path stays intact, and new regression tests cover both the preserved parameters case and UTF-8-split JSON payloads. Constraint: Existing PR #1918 review feedback must be fixed without staging unrelated local docs and artifact files Rejected: Keep String::from_utf8_lossy in Gemini SSE buffering \| corrupts split multibyte payloads and can drop JSON chunks Rejected: Always preserve the parameters wrapper \| regresses the existing nested-parameters rectification path for Skill-style tools Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep Gemini SSE buffering on the UTF-8-safe accumulator path and only unwrap parameters when the target schema does not declare it as a legitimate field Tested: cargo fmt --manifest-path src-tauri/Cargo.toml --all; cargo test --manifest-path src-tauri/Cargo.toml preserves_utf8_boundaries_when_json_payload_spans_chunks; cargo test --manifest-path src-tauri/Cargo.toml gemini_to_anthropic_rectifies_tool_args_from_schema_hints; cargo test --manifest-path src-tauri/Cargo.toml rectifies_streamed_skill_args_from_nested_parameters; cargo test --manifest-path src-tauri/Cargo.toml gemini_to_anthropic_preserves_legitimate_parameters_arg Not-tested: Full src-tauri test suite; live end-to-end Gemini relay traffic against upstream services * Keep Gemini tool replay stable across Claude request boundaries Claude Code follow-up requests were still falling back to locally reconstructed functionCall parts, which dropped Gemini thought signatures and triggered INVALID_ARGUMENT errors from the official Gemini API. The replay path needed to survive real Claude request boundaries, not just idealized in-process test flows. This change makes Claude requests reuse X-Claude-Code-Session-Id as the shadow session key, records streamed Gemini tool turns before tool_use events are fully drained, and matches assistant tool_use turns to shadow state by tool_use id and normalized tool name before positional fallback. Together these fixes keep thoughtSignature-bearing Gemini tool calls available for the next request in the loop. Constraint: Claude Code sends a stable X-Claude-Code-Session-Id header while metadata.session_id may be absent on follow-up requests Rejected: Rely on metadata-only Claude session extraction \| generated fresh session ids and broke cross-request shadow replay Rejected: Record Gemini shadow only after streaming completes \| loses the race when the client sends the next request immediately after tool_use Confidence: high Scope-risk: narrow Reversibility: clean Directive: Preserve Gemini shadow continuity across requests by keying Claude sessions from the header first and persisting tool-call shadow before yielding tool_use events downstream Tested: cargo fmt --manifest-path src-tauri/Cargo.toml --all; cargo test --manifest-path src-tauri/Cargo.toml test_extract_session_from_claude_header; cargo test --manifest-path src-tauri/Cargo.toml test_extract_session_from_claude_header_precedes_metadata; cargo test --manifest-path src-tauri/Cargo.toml stores_tool_shadow_before_tool_use_events_are_fully_drained; cargo test --manifest-path src-tauri/Cargo.toml shadow_replay_matches_tool_use_turn_by_id_when_position_drifts; cargo test --manifest-path src-tauri/Cargo.toml shadow_replay_aligns_to_latest_turns_after_client_truncation Not-tested: Full src-tauri test suite without test filters; live end-to-end Gemini relay after this exact commit hash * style: apply cargo fmt to pass Backend Checks CI Wrap prompt_cache_key chained call across lines per rustfmt default formatting. Pure formatting change, no behavior difference. * fix(proxy/gemini): synthesize unique ids for no-id tool calls + enforce object params schema P1 — Parallel tool calls without Gemini-assigned ids no longer collapse. Gemini 2.x native parallel `functionCall` entries may omit the `id` field. The previous `merge_tool_call_snapshots` fell back to matching by `name`, which silently merged two parallel calls to the same function into one entry — dropping the first call's args. The non-streaming path and shadow store further bottlenecked on empty-string ids: multiple `tool_use` blocks shared the same id, and `tool_name_by_id.get("")` could only return one mapping, causing later `tool_result` round-trips to fail with `Unable to resolve Gemini functionResponse.name` or bind to the wrong tool. Fix: introduce `synthesize_tool_call_id()` producing `gemini_synth_<uuid>`. Both streaming and non-streaming response paths now guarantee every Anthropic-visible tool_use carries a unique id. `merge_tool_call_snapshots` matches by id first, falling back to the `parts` array position (for the cumulative-streaming case) while preserving the synthesized id across chunks. `convert_message_content_to_parts` detects the synthetic prefix and strips the id from outbound `functionCall`/`functionResponse` so the internal identifier never leaks upstream. `shadow_parts` performs the same strip when replaying a recorded assistant turn. P2 — Vertex AI rejects empty `parameters` schemas. When an Anthropic tool arrives with missing or empty `input_schema`, the proxy used to emit `"parameters": {}` (no `type`), which fails Vertex AI validation with `functionDeclaration parameters schema should be of type OBJECT`. Contrary to the automated-review suggestion, the fix is not to omit `parameters` (that too is rejected) but to normalize to the canonical empty-object form `{type: "object", properties: {}}`. Refs: google-gemini/generative-ai-python#423, BerriAI/litellm#5055. Fix: new `ensure_object_schema` helper in `gemini_schema` promotes missing `type` to `"object"` and adds empty `properties` when absent, while leaving atomic (non-object) schemas untouched. Tests: seven new regressions covering parallel no-id calls, cumulative chunk id reuse, synthetic-id round-trip both directions, shadow replay id stripping, and the three Vertex-AI schema shapes. The two existing wrapper functions (`gemini_to_anthropic` and `gemini_to_anthropic_with_shadow`) gain `#[allow(dead_code)]` to clear a pre-existing clippy -D warnings failure — they are part of the public transform API surface and intentionally kept for future callers. Addresses Codex review P1/P2 on #1918. * fix(proxy/gemini): narrow URL normalization + guard empty OAuth access_token P2a — Preserve opaque relay URLs that contain `/v1/models/` prefixes. `should_normalize_gemini_full_url` previously flagged any full URL whose path merely contained `/v1beta/models/` or `/v1/models/` as a structured Gemini endpoint, forcing rewrite to `.../v1beta/models/{model}:method`. This silently dropped legitimate relay route segments (e.g. `https://relay.example/v1/models/invoke` → `.../v1beta/models/...:generateContent`, losing `/invoke`) and sent traffic to the wrong upstream path. Replace the bare `contains(...)` checks with `matches_structured_gemini_models_path`, which requires the `/models/` segment to be followed by a canonical Gemini method call (`:generateContent` or `:streamGenerateContent`). The `matches_bare_gemini_models_path` helper is generalized (and renamed) to handle both `/v1beta/models/` and `/v1/models/` alongside the original bare `/models/` shape. P2b — Reject empty Gemini OAuth access_tokens before they reach the bearer header. `GeminiAdapter::parse_oauth_credentials` accepts refresh-token-only JSON (and surfaces `{"access_token": "", ...}` for expired credentials) with `access_token` defaulting to `""`. The Claude adapter's GeminiCli branch then called `AuthInfo::with_access_token(key, creds.access_token)` unconditionally, so the bearer-header builder at `AuthStrategy::GoogleOAuth` resolved to `Authorization: Bearer ` — a deterministic 401 from upstream. CC Switch does not currently exchange the refresh_token for a fresh access_token (`OAuthCredentials::needs_refresh` / `can_refresh` are annotated `#[allow(dead_code)]`). Until that exists, only attach `access_token` when it is non-empty; fall back to plain GoogleOAuth strategy with the raw key and log a warn pointing users at `~/.gemini/oauth_creds.json` so the failure mode is observable. Tests: - gemini_url.rs: three new regressions — opaque `/v1/models/invoke`, opaque `/v1beta/models/route`, and the positive counter-case where a structured `/v1/models/...:generateContent` path still normalizes. - claude.rs: three new `test_extract_auth_gemini_cli_` tests covering refresh-only JSON, empty-string access_token JSON, and the valid-JSON pass-through. All 839 lib tests pass; cargo fmt + clippy -D warnings clean. Addresses Codex review P2 findings on #1918. fix(proxy/gemini): treat empty-string functionCall id as missing in streaming path Follow-up to the earlier P1 fix: some Gemini relays serialize an absent functionCall id as `"id": ""` instead of omitting the field. The non-streaming `extract_tool_call_meta` already filters these via `.filter(\|s\| !s.is_empty())`, but the streaming counterpart `extract_tool_calls` passed the empty string straight through `function_call.get("id").and_then(\|v\| v.as_str())` into `GeminiToolCallMeta::new`, producing a `Some("")` id. Downstream, `merge_tool_call_snapshots` would then match two parallel no-id calls against each other on their shared empty-string id, collapsing them into a single snapshot (silent data loss for the first call) and emitting an Anthropic `tool_use.id: ""` that breaks tool_result correlation on the Claude Code client. Fix: - `extract_tool_calls`: apply the same `filter(\|s\| !s.is_empty())` guard used in the non-streaming path so empty strings become `None` before reaching the shadow meta. - `merge_tool_call_snapshots`: defensively collapse any incoming `Some("")` to `None` up front — keeps the "missing vs present" invariant local to the merge step for future callers that might build `GeminiToolCallMeta` by hand. Tests (2 new, both in streaming_gemini): - `parallel_empty_string_id_calls_are_treated_as_missing_and_preserved` covers two parallel calls with explicit `"id": ""` — asserts both surface, no empty tool_use id leaks, and each gets a unique `gemini_synth_` id. - `single_empty_string_id_tool_call_gets_synthesized_id` covers the non-parallel degraded-relay case. All 841 lib tests pass; cargo fmt + clippy -D warnings clean. Addresses Codex follow-up P1 on #1918. * fix(proxy/gemini): gate generic REST path suffixes behind Google host whitelist `should_normalize_gemini_full_url` previously treated any full URL whose path ends with `/v1`, `/v1/models`, `/models`, `/v1/openai`, or `/openai` as a structured Gemini endpoint and rewrote it to `/v1beta/models/{model}:generateContent`. These are ubiquitous REST conventions — opaque relays such as `https://relay.example/custom/v1` legitimately use them for fixed endpoints — so the rewrite silently routed traffic to the wrong upstream path. Split the predicate into two layers: - Unconditional: `matches_structured_gemini_models_path` (i.e. a `/models/...:generateContent` method call anywhere in the path), the Google-specific `/v1beta` family, and the deep OpenAI-compat paths (`/v1beta/openai/chat/completions`, `/openai/chat/completions`, and their `responses` siblings). These remain host-agnostic because the path grammar itself is Gemini-specific. - Google-host gated: `/v1`, `/v1/models`, `/models`, `/v1/openai`, `/openai`. Only normalized when the host is one of `generativelanguage.googleapis.com`, `aiplatform.googleapis.com`, or a real `-aiplatform.googleapis.com` Vertex regional endpoint. The match is exact/suffix (not `contains`), so lookalike hosts like `aiplatform.example.com` are correctly treated as opaque relays. Tests (8 new in `gemini_url::tests`): - Four opaque-relay cases: `/custom/v1`, `/custom/models`, `/custom/v1/models`, `/custom/openai` — all preserved as-is. - Three Google-host counter-cases: `/v1`, `/models`, and `us-central1-aiplatform.googleapis.com/v1` still normalize. - One lookalike safety case: `aiplatform.example.com/v1` is NOT treated as Google. All 849 lib tests pass; cargo fmt + clippy -D warnings clean. Addresses Codex review P2 on #1918. * fix(proxy/gemini): align shadow id with client-visible id in non-streaming path When Gemini returns a `functionCall` without an id (common in 2.x parallel calls), `gemini_to_anthropic_with_shadow_and_hints` previously generated TWO independent synthesized UUIDs: 1. Line 186-197 — synthesized id `A` used for the Anthropic-visible `content[tool_use].id` returned to the client. 2. Line 850-881 — `extract_tool_call_meta` independently synthesized id `B ≠ A`, which populated `shadow_turn.tool_calls[i].id`. `shadow_content` (line 225-228, cloned from `rectified_parts`) retained the original missing/empty id. Result: the client sees id `A`, the shadow store holds id `B`. On the next turn, `convert_messages_to_contents` builds `tool_name_by_id` from `build_tool_name_map_from_shadow_turns`, which uses `tool_calls[i].id` — so the map contains `B → name` but not `A → name`. When the client sends back `tool_result(tool_use_id=A)`, resolution fails with: Unable to resolve Gemini functionResponse.name for tool_use_id `A` This affects both truncated histories (client sends only the tool_result) and full histories (shadow-replay branch at line 342-354 skips `convert_message_content_to_parts`, so the assistant tool_use block never registers id `A` itself). Fix: make `rectified_parts` the single source of truth. After `rectify_tool_call_parts`, run a pre-pass that writes `synthesize_tool_call_id()` back into any `functionCall` that lacks a non-empty id. All three readers — the content builder (186-197), the shadow_content clone (225-228), and `extract_tool_call_meta` — then observe the same id. `shadow_parts()` already strips synthesized ids on replay (line 616-628), so the internal identifier never leaks to Gemini upstream. This mirrors the streaming path, which already has single-source-of- truth semantics via `tool_call_snapshots` in `streaming_gemini.rs` — no change needed there. Tests (5 new in `transform_gemini::tests`): - `non_stream_shadow_id_matches_client_visible_id`: asserts `response.content[0].id == shadow.tool_calls[0].id == shadow.assistant_content.parts[0].functionCall.id`. - `non_stream_missing_id_scenario_a_truncated_history_resolves`: turn 2 sends only `[tool_result(id=A)]`; resolution must succeed. - `non_stream_missing_id_scenario_b_full_history_replay_resolves`: turn 2 sends `[assistant(tool_use=A), tool_result(A)]`; shadow-replay branch strips the synth id from outgoing `functionCall` while still resolving the subsequent `tool_result`. - `non_stream_preserves_original_gemini_id_when_present`: regression — genuine Gemini ids flow through unchanged. - `non_stream_synthesized_id_not_leaked_to_gemini_via_shadow_replay`: defensive — shadow-replay path must strip synth ids from both `functionCall.id` and `functionResponse.id`. All 854 lib tests pass; cargo fmt + clippy -D warnings clean. Addresses Codex follow-up P1 on #1918. * refactor(proxy/gemini): share build_anthropic_usage between stream and non-stream paths `streaming_gemini::anthropic_usage_from_gemini` and `transform_gemini::build_anthropic_usage` were byte-for-byte identical (32 lines each) — both converting Gemini `usageMetadata` into the Anthropic `usage` shape including `cache_read_input_tokens` mapping. Promote the non-streaming version to `pub(crate)` and reuse it from the streaming SSE converter. Removes ~30 lines of duplication and guarantees the two paths cannot drift apart. No behavioral change; all 854 lib tests pass; cargo fmt + clippy -D warnings clean. * fix(proxy/gemini): gate /v1beta behind Google host + normalize models/ model id prefix Two related P2 corrections to the Gemini Native URL surface, both folding into the existing Google-host-whitelist architecture. ## P2a — `/v1beta` suffix should not unconditionally trigger rewrite `should_normalize_gemini_full_url` placed `/v1beta` and `/v1beta/models` in the unconditional layer on the reasoning that `/v1beta` is Google-specific. In practice an opaque relay fronting a non-Gemini service at `https://relay.example/custom/v1beta` would still be silently rewritten to `/v1beta/models/{model}:generateContent`, breaking the deployment. Move `/v1beta`, `/v1beta/models`, and `/v1beta/openai` into the Google-host gated layer alongside `/v1`, `/models`, and friends. The unconditional layer now only accepts paths whose grammar is intrinsically Gemini — `/models/...:generateContent` method calls and the deep OpenAI-compat endpoints like `/openai/chat/completions` and `/openai/responses`. Pasted AI-Studio URLs such as `https://generativelanguage.googleapis.com/v1beta` still normalize because the host matches the whitelist. ## P2b — `model: "models/gemini-2.5-pro"` produced doubled path prefix Gemini SDKs (and the official `list_models` response) commonly surface model ids in resource-name form `models/gemini-2.5-pro`. Raw interpolation into `format!("/v1beta/models/{model}:...")` produced `/v1beta/models/models/gemini-2.5-pro:streamGenerateContent` which upstream rejects — yielding false-negative health checks for otherwise valid provider configs. Introduce `normalize_gemini_model_id(&str) -> &str` in `gemini_url` as the single source of truth: strips an optional leading `/` then an optional `models/` prefix, leaving bare ids untouched. Apply in the three call sites that build a Gemini method URL: - `services/stream_check.rs::resolve_claude_stream_url` (unified path) - `services/stream_check.rs::check_gemini_stream` (Gemini-only path) - `proxy/forwarder.rs::rewrite_claude_transform_endpoint` (production) Tests (9 new): - `gemini_url`: 3 regressions for opaque vs Google-host `/v1beta` handling + 5 unit tests pinning `normalize_gemini_model_id` behavior (strip prefix, leave bare id, preserve nested slashes past the one stripped prefix, tolerate leading slash, pass through empty input). - `stream_check`: one end-to-end regression confirming `models/gemini-2.5-pro` collapses to the expected single-prefix URL. - `forwarder`: one end-to-end regression on the production rewrite path. All 864 lib tests pass; cargo fmt + clippy -D warnings clean. Addresses Codex P2 feedback on #1918. fix(proxy/gemini): trim API key before provider-type detection and OAuth parsing Leading whitespace on a copied oauth_creds.json (e.g. trailing newline when the user copies the file content as-is) would slip past the `starts_with("ya29.") \|\| starts_with('{')` prefix check in `ClaudeAdapter::provider_type`, causing the provider to be misclassified as raw-API-key Gemini and fall back to `x-goog-api-key` with the raw JSON as the key — which upstream rejects with 401. The frontend's `handleApiKeyChange` already trims on keystrokes but deep-link imports, the JSON editor, and live-config backfill all bypass that path. Trim at every backend extraction point so the coverage is uniform: - `ClaudeAdapter::extract_key` (5 env / fallback branches) gets `.map(str::trim)` before `.filter(\|s\| !s.is_empty())` so that whitespace-only values are also treated as missing. - `GeminiAdapter::extract_key_raw` gets the same chain (including the `.filter` it was missing before). - `GeminiAdapter::parse_oauth_credentials` gets a defensive `let key = key.trim();` at the entry as a belt-and-suspenders guard. Adds two regression tests covering JSON and bare `ya29.` keys with leading newline/space. * fix(proxy/gemini): gate generic REST suffix stripping behind Google host in non-full-URL mode `build_gemini_native_url` unconditionally stripped `/v1`, `/v1beta`, `/models`, and `/openai` suffixes from the base path regardless of host. This worked for Google's own endpoints but silently rewrote third-party relay URLs like `https://relay.example/custom/v1` to `.../custom/v1beta/models/...`, breaking any relay that mounts its Gemini-compatible namespace under a versioned prefix. The result was also asymmetric with the previously-fixed full-URL branch: toggling the "full URL" switch changed the outbound URL for the same base_url, which is exactly the kind of invisible behavior that makes debugging proxy deployments painful. Align `normalize_gemini_base_path` with `should_normalize_gemini_full_url`'s layered model: - Unconditional: `/models/...:method` structured paths and deep OpenAI-compat endpoints (`/openai/chat/completions`, `/openai/responses` and their versioned variants) — these are unambiguous Gemini-specific grammar on any host. - Google-host gated: generic `/v1`, `/v1beta`, `/models`, `/openai` suffixes only get stripped on `generativelanguage.googleapis.com`, `aiplatform.googleapis.com`, or `-aiplatform.googleapis.com`. Other hosts preserve the prefix verbatim so relays keep their intended routing. Adds seven regression tests for the non-full-URL flow: opaque relay preservation (v1 / v1beta / models / openai suffix variants), Google host normalization (counter-case), and boundary cases (structured method path and deep OpenAI-compat endpoint stripped regardless of host). Test count: 864 -> 873. Revert "fix(proxy/gemini): gate generic REST suffix stripping behind Google host in non-full-URL mode" This reverts commit `d19ff09cb7`. * test(proxy/gemini): pin non-full-URL versioned relay base stripping Adds two regression tests that lock in the intentional asymmetry between full-URL and non-full-URL modes: - Full-URL mode: opaque base path (e.g. `https://relay.example/custom/v1beta`) is preserved verbatim. Already covered by `preserves_opaque_full_url_with_bare_v1beta_suffix`. - Non-full-URL mode: base path MUST strip `/v1`, `/v1beta`, etc. so the standard `/v1beta/models/{model}:method` endpoint can be appended without producing a doubled `/v1beta/v1beta/models/...` path. The non-full-URL contract is "base URL + cc-switch appends the canonical Gemini endpoint". A user who needs a relay's custom namespace (e.g. `/v1/models/...`) must use full-URL mode and paste the complete method path. This commit adds regression coverage so a future attempt to mirror full-URL's host-whitelist gating into `normalize_gemini_base_path` will fail the test suite immediately. * chore(lint): address clippy 1.95 findings in existing modules CI upgraded to Rust 1.95 and flagged ten pre-existing warnings that older toolchains did not enforce. None relate to the Gemini proxy integration PR itself but they block CI on the feature branch, so clean them up here as a separate commit for easy review: collapsible_match: - proxy/providers/gemini_schema.rs: `"items" if value.is_object()` match guard instead of nested if. - proxy/providers/transform_responses.rs: fold `map_responses_stop_reason`'s `"completed"` / `"incomplete"` arms into match guards, relying on the existing `_ => "end_turn"` fall- through for non-matching guard conditions (semantics preserved). - services/session_usage_codex.rs: fold `"session_meta" if state.session_id.is_none()` guard, relying on the existing `_ => {}` fall-through. unnecessary_sort_by: - services/provider/endpoints.rs: `sort_by_key(\|ep\| Reverse(ep.added_at))`. - services/skill.rs (backup list): same Reverse idiom on `created_at`. - services/skill.rs (skill listings x2): `sort_by_key(\|s\| s.name.to_lowercase())`. useless_conversion: - services/skill.rs: drop the explicit `.into_iter()` on `zip`'s argument. while_let_loop: - services/webdav_auto_sync.rs: `while let Some(wait_for) = ...` instead of `loop { let Some(...) = ... else { break }; ... }`. All changes are mechanical and preserve behavior. `cargo test --lib` remains green (868 passed). * fix(proxy/gemini): reconcile synthesized tool-call ids with later real ids + preserve thoughtSignature Three related findings on `streaming_gemini.rs` for Gemini's cumulative `streamGenerateContent` stream, all centered on `merge_tool_call_snapshots`: 1. (P1) Match upgraded tool-call IDs by position. When Gemini delivers a `functionCall` without an id on chunk 1 (cc-switch synthesizes `gemini_synth_`) and then upgrades it to a real id on chunk 2, the `Some(incoming_id)` branch only matched by id and missed the existing synthesized snapshot. A second entry would be pushed, yielding duplicate `tool_use` content blocks at stream end — one with the synthesized id, one with the real id — which could trigger duplicate tool execution and break tool_result correlation. Add a positional fallback: when no id match exists but the same-position slot holds a synthesized id, merge into it. `or(preserved_id)` already lets the real id win the merge. 2. (P2) Preserve prior thoughtSignature when merging snapshots. `tool_call_snapshots[index] = tool_call` overwrote the slot entirely, dropping any `thoughtSignature` captured on an earlier chunk if the current cumulative snapshot omitted it. Since `build_shadow_assistant_parts` writes `thoughtSignature` into the shadow turn from `tool_call.thought_signature`, a dropped signature would cause later replay requests to Gemini to be rejected with invalid-signature errors. Preserve the existing signature when the incoming chunk does not carry one. 3. (P2) Document the part-order streaming trade-off. All `tool_use` content blocks are emitted after the final text `content_block_stop`, so interleaved [text, functionCall, text, functionCall] parts arrive at the Anthropic client as [text(concat), tool_use, tool_use] — different from the non-streaming transformer, which preserves part order. This is intentional given the cumulative snapshot model and the consumers we target (claude-code-like clients don't depend on strict interleaving for tool execution correctness). Add a block comment at the flush site describing the trade-off and what a strict-order fix would entail, so this isn't rediscovered as a bug later. Regression tests: - upgraded_real_id_merges_into_existing_synthesized_snapshot - thought_signature_preserved_when_later_chunk_omits_it Test count: 868 -> 870. clippy 1.95 clean. fmt clean. fix(proxy/gemini): prefer exact tool-call id over normalized-name fallback The shadow-turn matcher used a three-branch `\|\|` chain (id / full name / normalized name). When two tools share a suffix (e.g. `server_a:search` and `server_b:search`), the normalized-name clause could short-circuit on an earlier turn whose id is actually wrong for the incoming tool_use, mis-routing replay state (functionCall id / thoughtSignature) for later tool_result resolution. Split matching into two layers: when the incoming message carries any tool_use ids, run id-based lookup first and return on the earliest hit. Only fall back to full-name / normalized-name matching when the incoming ids are absent or none of them resolve. Add two regressions: - shadow_replay_prefers_exact_id_match_over_normalized_name_collision Two shadow turns with colliding normalized names and two assistant messages whose ids cross the positional order; asserts each message replays the id-correct shadow turn (including thoughtSignature). - shadow_replay_falls_back_to_name_when_ids_absent Shadow turn with no id and incoming tool_use with an empty id; asserts the name fallback still populates the replayed part. --------- Co-authored-by: Jason <farion1231@gmail.com>	2026-04-16 22:42:49 +08:00
Jason	2513687184	feat: add Copilot optimizer to reduce premium interaction consumption Implement request classification, tool result merging, compact detection, deterministic request IDs, and warmup downgrade for Copilot proxy. The root cause was x-initiator being hardcoded to "user", making Copilot count every API request (including tool callbacks and agent continuations) as a separate premium interaction. The optimizer dynamically classifies requests as "user" or "agent" based on message content analysis. Closes #1813	2026-04-05 08:34:10 +08:00
Keith Yu	8217bfff50	feat: add Bedrock request optimizer (PRE-SEND thinking + cache injection) (#1301 ) * feat: add Bedrock request optimizer (PRE-SEND thinking + cache injection) Add a PRE-SEND request optimizer that enhances Bedrock API requests before forwarding, complementing the existing POST-ERROR rectifier system. New modules: - thinking_optimizer: 3-path model detection (adaptive/legacy/skip) - Opus 4.6/Sonnet 4.6: adaptive thinking + effort max + 1M context beta - Legacy models: inject extended thinking with max budget - Haiku: skip (no modification) - cache_injector: auto-inject cache_control breakpoints (max 4) - Injects at tools/system/assistant message positions - TTL upgrade for existing breakpoints (5m → 1h) Gate: only activates for Bedrock providers (CLAUDE_CODE_USE_BEDROCK=1) Config: stored in SQLite settings table, default OFF, user opt-in UI: new Optimizer section in RectifierConfigPanel with 3 toggles + TTL 18 unit tests covering all paths. Verified against live Bedrock API. * chore: remove docs/plans directory * fix: address code review findings for Bedrock request optimizer P0 fixes: - Replace hardcoded Chinese with i18n t() calls in optimizer panel, add translation keys to zh/en/ja locale files - Fix u64 underflow: max_tokens - 1 → max_tokens.saturating_sub(1) - Move optimizer from before retry loop to per-provider with body cloning, preventing Bedrock fields leaking to non-Bedrock providers P1 fixes: - Replace .map() side-effect pattern with idiomatic if-let (clippy) - Fix module alphabetical ordering in mod.rs - Add cache_ttl whitelist validation in set_optimizer_config - Remove #[allow(unused_assignments)] and dead budget decrement --------- Co-authored-by: Keith (via OpenClaw) <keithyt06@users.noreply.github.com> Co-authored-by: Jason <farion1231@gmail.com>	2026-03-07 18:57:21 +08:00
Dex Miller	f3343992f2	feat(proxy): add thinking signature rectifier for Claude API (#595 ) * feat(proxy): add thinking signature rectifier for Claude API Add automatic request rectification when Anthropic API returns signature validation errors. This improves compatibility when switching between different Claude providers or when historical messages contain incompatible thinking block signatures. - Add thinking_rectifier.rs module with trigger detection and rectification - Integrate rectifier into forwarder error handling flow - Remove thinking/redacted_thinking blocks and signature fields on retry - Delete top-level thinking field when assistant message lacks thinking prefix * fix(proxy): complete rectifier retry path with failover switch and chain continuation - Add failover switch trigger on rectifier retry success when provider differs from start - Replace direct error return with error categorization on rectifier retry failure - Continue failover chain for retryable errors instead of terminating early * feat(proxy): add rectifier config with master switch - Add RectifierConfig struct with enabled and requestThinkingSignature fields - Update should_rectify_thinking_signature to check master switch first - Add tests for master switch functionality * feat(db): add rectifier config storage in settings table Store rectifier config as JSON in single key for extensibility * feat(commands): add get/set rectifier config commands * feat(ui): add rectifier config panel in advanced settings - Add RectifierConfigPanel component with master switch and thinking signature toggle - Add API wrapper for rectifier config - Add i18n translations for zh/en/ja * feat(proxy): integrate rectifier config into request forwarding - Load rectifier config from database in RequestContext - Pass config to RequestForwarder for runtime checking - Use should_rectify_thinking_signature with config parameter * test(proxy): add nested JSON error detection test for thinking rectifier * fix(proxy): resolve HalfOpen permit leak and RectifierConfig default values - Fix RectifierConfig::default() to return enabled=true (was false due to derive) - Add release_permit_neutral() for releasing permits without affecting health stats - Fix 3 permit leak points in rectifier retry branches - Add unit tests for default values and permit release * style(ui): format ProviderCard style attribute * fix(rectifier): add detection for signature field required error Add support for detecting "signature: Field required" error pattern in the thinking signature rectifier. This enables automatic request rectification when upstream API returns this specific validation error.	2026-01-14 00:12:13 +08:00
Dex Miller	6dd809701b	Refactor/simplify proxy logs (#585 ) * refactor(proxy): simplify logging for better readability - Delete 17 verbose debug logs from handlers, streaming, and response_processor - Convert excessive INFO logs to DEBUG level for internal processing details - Add 2 critical INFO logs in forwarder.rs for failover scenarios: - Log when switching to next provider after failure - Log when all providers have been exhausted - Fix clippy uninlined_format_args warning This reduces log noise while maintaining visibility into key user-facing decisions. * fix: replace unsafe unwrap() calls with proper error handling - database/dao/mcp.rs: Use map_err for serde_json serialization - database/dao/providers.rs: Use map_err for settings_config and meta serialization - commands/misc.rs: Use expect() for compile-time regex pattern - services/prompt.rs: Use unwrap_or_default() for SystemTime - deeplink/provider.rs: Replace unwrap() with is_none_or pattern for Option checks Reduces potential panic points from 26 to 1 (static regex init, safe). * refactor(proxy): simplify verbose logging output - Remove response JSON full output logging in response_processor - Remove per-request INFO logs in provider_router (failover status, provider selection) - Change model mapping log from INFO to DEBUG - Change usage logging failure from INFO to WARN - Remove redundant debug logs for circuit breaker operations Reduces log noise significantly while preserving important warnings and errors. * feat(proxy): add structured log codes for i18n support Add error code system to proxy module logs for multi-language support: - CB-001~006: Circuit breaker state transitions and triggers - SRV-001~004: Proxy server lifecycle events - FWD-001~002: Request forwarding and failover - FO-001~005: Failover switch operations - USG-001~002: Usage logging errors Log format: [CODE] Chinese message Frontend/log tools can map codes to any language. New file: src/proxy/log_codes.rs - centralized code definitions * chore: bump version to 3.9.1 * style: format code with prettier and rustfmt * fix(ui): allow number inputs to be fully cleared before saving - Convert numeric state to string type for controlled inputs - Use isNaN() check instead of \|\| fallback to allow 0 values - Apply fix to ProxyPanel, CircuitBreakerConfigPanel, AutoFailoverConfigPanel, and ModelTestConfigPanel * feat(pricing): support @ separator in model name matching - Refactor model name cleaning into chained method calls - Add @ to - replacement (e.g., gpt-5.2-codex@low → gpt-5.2-codex-low) - Add test case for @ separator matching * fix(proxy): improve validation and error handling in proxy config panels - Add StopTimeout/StopFailed error types for proper stop() error reporting - Replace silent clamp with validation-and-block in config panels - Add listenAddress format validation in ProxyPanel - Use log_codes constants instead of hardcoded strings - Use once_cell::Lazy for regex precompilation * fix(proxy): harden error handling and input validation - Handle RwLock poisoning in settings.rs with unwrap_or_else - Add fallback for dirs::home_dir() in config modules - Normalize localhost to 127.0.0.1 in ProxyPanel - Format IPv6 addresses with brackets for valid URLs - Strict port validation with pure digit regex - Treat NaN as validation failure in config panels - Log warning on cost_multiplier parse failure - Align timeoutSeconds range to [0, 300] across all panels	2026-01-11 20:50:54 +08:00
Dex Miller	5376ea042b	Feat/usage improvements (#508 ) * i18n: update cache terminology across all languages - Change 'Cache Read' to 'Cache Hit' in all languages - Change 'Cache Write' to 'Cache Creation' in all languages - Update zh: 缓存读取 → 缓存命中, 缓存写入 → 缓存创建 - Update en: Cache Read → Cache Hit, Cache Write → Cache Creation - Update ja: キャッシュ読取 → キャッシュヒット, キャッシュ書込 → キャッシュ作成 Affected keys: cacheReadTokens, cacheCreationTokens, cacheReadCost, cacheWriteCost, cacheRead, cacheWrite * feat(usage): add cache metrics to trend chart - Add cache creation tokens visualization (orange line) - Add cache hit tokens visualization (purple line) - Add gradient definitions for new cache metrics - Include cache data in hourly aggregation - Display cache metrics alongside input/output tokens This provides better visibility into cache usage patterns over time. * fix(usage): fix timezone handling in datetime picker - Add timestampToLocalDatetime() to convert Unix timestamp to local datetime - Add localDatetimeToTimestamp() with validation for incomplete input - Fix issue where typing hours/minutes would jump to previous day - Validate datetime format completeness before conversion - Use local timezone instead of UTC for datetime-local input This resolves the issue where users couldn't fine-tune time selection and the input would jump unexpectedly when editing hours or minutes. * feat(usage): add auto-refresh for usage statistics - Add 30-second auto-refresh interval for all usage queries - Disable background refresh to save resources - Apply to: summary, trends, provider stats, model stats, request logs - Queries automatically update when tab is active - Pause refresh when user switches to another tab This keeps usage data fresh without manual refresh. * fix(proxy): improve usage logging and cache token parsing - Log requests even when usage parsing fails (with default values) - Add detailed debug logging for usage metrics - Support cache_read_input_tokens field in Codex responses - Fallback to input_tokens_details.cached_tokens if needed - Add test case for cached_tokens in input_tokens_details - Ensure all requests are tracked in database for analytics This fixes missing request logs when API responses lack usage data and improves cache token detection across different response formats. * style(rust): use inline format args in format! macros - Replace format!("...", var) with format!("...{var}") - Update universal provider ID formatting - Update error message formatting - Update config.toml generation in Codex provider Fixes clippy::uninlined_format_args warnings. * feat(proxy): enhance provider router logging - Add debug logs for failover queue provider count - Log circuit breaker state for each provider check - Add logs for missing current provider scenarios - Log when no current provider is configured - Use inline format args for better readability This improves debugging of provider selection and failover behavior. * feat(database): update model pricing data - Update Claude models to full version format (e.g. claude-opus-4-5-20251101) - Add GPT-5.2 series model pricing (10 models) - Add GPT-5.1 series model pricing (10 models) - Add GPT-5 series model pricing (12 models) - Add Gemini 3 series model pricing (2 models) - Update Gemini 2.5 series model ID format (use dot separator) - Unify display names by removing thinking level suffixes * fix(usage): correct Gemini output token calculation Fix Gemini API output token parsing to use totalTokenCount - promptTokenCount instead of candidatesTokenCount alone. This ensures thoughtsTokenCount is included in output statistics. - Update from_gemini_response to calculate output from total - input - Update from_gemini_stream_chunks with same logic for consistency - Fix from_codex_stream_events to use adjusted token calculation - Add test case for responses with thoughtsTokenCount - Update existing tests to match new calculation logic * fix(usage): correct cache token billing and add Codex format auto-detection - Avoid double-billing cache tokens by subtracting from input before calculation - Add smart Codex parser that auto-detects OpenAI vs Codex API format - Extract model name from Codex responses for accurate tracking * fix(proxy): improve takeover detection with live config check - Add live config takeover detection for hot-switch decision - Rebuild takeover when backup is missing or placeholder remains - Make detect_takeover_in_live_config_for_app public - Fix is_takeover_active to use actual takeover status * refactor(usage): simplify model pricing lookup by removing suffix fallback Replace complex suffix-stripping fallback with direct prefix/suffix cleanup. Model IDs are now cleaned by removing vendor prefix (before /) and colon suffix (after :), then matched exactly against pricing table. * feat(database): add Chinese AI model pricing data Add pricing for domestic AI models (CNY/1M tokens): - Doubao-Seed-Code (ByteDance) - DeepSeek V3/V3.1/V3.2 - Kimi K2/K2-Thinking/K2-Turbo (Moonshot) - MiniMax M2/M2.1/M2.1-Lightning - GLM-4.6/4.7 (Zhipu) - Mimo V2 Flash (Xiaomi) Also fix test case to use correct model ID and remove invalid currency column. * refactor(proxy): improve header forwarding with blacklist approach Change from whitelist to blacklist mode for request header forwarding. Only skip headers that will be overridden (auth, host, content-length). This preserves client's original headers and improves compatibility. * fix(proxy): bypass timeout and retry configs when failover is disabled When auto_failover_enabled is false, timeout and retry configurations should not affect normal request flow. This change ensures: - create_forwarder: passes 0 for all timeout/retry params when failover is disabled, effectively bypassing these checks - streaming_timeout_config: returns 0 for both first_byte_timeout and idle_timeout when failover is disabled This prevents unnecessary timeout errors and retry attempts when users have explicitly disabled the failover feature. * fix(proxy): handle zero value input in failover config fields * refactor(proxy): remove retry logic and add enabled check for failover * refactor(proxy): distinguish circuit-open from no-provider errors * Align usage stats to sliding windows * feat(proxy): add body and header filtering for upstream requests * feat(proxy): enable transparent passthrough for headers - Passthrough anthropic-beta header as-is from client - Passthrough anthropic-version header from client - Passthrough client IP headers (x-forwarded-for, x-real-ip) by default - Filter private params (underscore-prefixed fields) from request body - No database changes required * feat(proxy): extract session ID from client requests for logging - Add SessionIdExtractor to parse session ID from Claude/Codex requests - Support extraction from metadata.user_id, headers, previous_response_id - Pass session_id through RequestContext to usage logger - Enable request correlation by session in proxy_request_logs	2025-12-31 22:57:00 +08:00
YoVinchen	e6f18ba801	Feat/usage model extraction (#455 ) * feat(proxy): extract model name from API response for accurate usage tracking - Add model field extraction in TokenUsage parsing for Claude, OpenAI, and Codex - Prioritize response model over request model in usage logging - Update model extractors to use parsed usage.model first - Add tests for model extraction in stream and non-stream responses * feat(proxy): implement streaming timeout control with validation - Add first byte timeout (0 or 1-180s) for streaming requests - Add idle timeout (0 or 60-600s) for streaming data gaps - Add non-streaming timeout (0 or 60-1800s) for total request - Implement timeout logic in response processor - Add 1800s global timeout fallback when disabled - Add database schema migration for timeout fields - Add i18n translations for timeout settings * feat(proxy): add model mapping module for provider-based model substitution - Add model_mapper.rs with ModelMapping struct to extract model configs from Provider - Support ANTHROPIC_MODEL, ANTHROPIC_REASONING_MODEL, and default models for haiku/sonnet/opus - Implement thinking mode detection for reasoning model priority - Include comprehensive unit tests for all mapping scenarios * fix(proxy): bypass circuit breaker for single provider scenario When failover is disabled (single provider), circuit breaker open state would block all requests causing poor UX. Now bypasses circuit breaker check in this scenario. Also integrates model mapping into request flow. * feat(ui): add reasoning model field to Claude provider form Add ANTHROPIC_REASONING_MODEL configuration field for Claude providers, allowing users to specify a dedicated model for thinking/reasoning tasks. * feat(proxy): add openrouter_compat_mode for optional format conversion Add configurable OpenRouter compatibility mode that enables Anthropic to OpenAI format conversion. When enabled, rewrites endpoint to /v1/chat/completions and transforms request/response formats. Defaults to enabled for OpenRouter. * feat(ui): add OpenRouter compatibility mode toggle Add UI toggle for OpenRouter providers to enable/disable compatibility mode which uses OpenAI Chat Completions format with SSE conversion. * feat(stream-check): use provider-configured model for health checks Extract model from provider's settings_config (ANTHROPIC_MODEL, GEMINI_MODEL, or Codex config.toml) instead of always using default test models. * refactor(ui): remove timeout settings from AutoFailoverConfigPanel Remove streaming/non-streaming timeout configuration from failover panel as these settings have been moved to a dedicated location. * refactor(database): migrate proxy_config to per-app three-row structure Replace singleton proxy_config table with app_type primary key structure, allowing independent proxy settings for Claude, Codex, and Gemini. Add GlobalProxyConfig queries and per-app config management in DAO layer. * feat(proxy): add GlobalProxyConfig and AppProxyConfig types Add new type definitions for the refactored proxy configuration: - GlobalProxyConfig: shared settings (enabled, address, port, logging) - AppProxyConfig: per-app settings (failover, timeouts, circuit breaker) * refactor(proxy): update service layer for per-app config structure Adapt proxy service, handler context, and provider router to use the new per-app configuration model. Read enabled/timeout settings from proxy_config table instead of settings table. * feat(commands): add global and per-app proxy config commands Add new Tauri commands for the refactored proxy configuration: - get_global_proxy_config / update_global_proxy_config - get_proxy_config_for_app / update_proxy_config_for_app Update startup restore logic to read from proxy_config table. * feat(api): add frontend API and Query hooks for proxy config Add TypeScript wrappers and TanStack Query hooks for: - Global proxy config (address, port, logging) - Per-app proxy config (failover, timeouts, circuit breaker) - Proxy takeover status management * refactor(ui): redesign proxy panel with inline config controls Replace ProxySettingsDialog with inline controls in ProxyPanel. Add per-app takeover switches and global address/port settings. Simplify AutoFailoverConfigPanel by removing timeout settings. * feat(i18n): add proxy takeover translations and update types Add i18n strings for proxy takeover status in zh/en/ja. Update TypeScript types for GlobalProxyConfig and AppProxyConfig. * refactor(proxy): load circuit breaker config per-app instead of globally Extract app_type from router key and read circuit breaker settings from the corresponding proxy_config row for each application.	2025-12-25 10:40:11 +08:00
Jason	0ef8a4153f	fix(proxy): sync UI when active provider differs from current setting Previously, UI sync was triggered only when failover happened (retry count > 1). This missed cases where the first provider in the failover queue succeeded but was different from the user's selected provider in settings. Now we capture the current provider ID at request start and compare it with the actually used provider. This ensures UI/tray always reflects the real provider handling requests.	2025-12-17 10:22:02 +08:00
Jason	9196d07925	feat(proxy): sync UI when failover succeeds Add FailoverSwitchManager to handle provider switching after successful failover. This ensures the UI reflects the actual provider in use: - Create failover_switch.rs with deduplication and async switching logic - Pass AppHandle through ProxyService -> ProxyServer -> RequestForwarder - Update is_current in database when failover succeeds - Emit provider-switched event for frontend refresh - Update tray menu and live backup synchronously The switching runs asynchronously via tokio::spawn to avoid blocking API responses while still providing immediate UI feedback.	2025-12-15 17:12:36 +08:00
Jason	5a5ca2a989	fix(proxy): resolve circuit breaker state persistence and HalfOpen deadlock This commit addresses several critical issues in the failover system: Circuit breaker state persistence (previous fix) - Promote ProviderRouter to ProxyState for cross-request state sharing - Remove redundant router.rs module - Fix 429 errors to be retryable (rate limiting should try other providers) Hot-update circuit breaker config - Add update_circuit_breaker_configs() to ProxyServer and ProxyService - Connect update_circuit_breaker_config command to running circuit breakers - Add reset_provider_circuit_breaker() for manual breaker reset Fix HalfOpen deadlock bug - Change half_open_requests from cumulative count to in-flight count - Release quota in record_success()/record_failure() when in HalfOpen state - Prevents permanent deadlock when success_threshold > 1 Fix duplicate select_providers() call - Store providers list in RequestContext, pass to forward_with_retry() - Avoid consuming HalfOpen quota twice per request - Single call to select_providers() per request lifecycle Add per-provider retry with exponential backoff - Implement forward_with_provider_retry() with configurable max_retries - Backoff delays: 100ms, 200ms, 400ms, etc.	2025-12-13 22:47:49 +08:00
Jason	ebe2a665ae	refactor(proxy): modularize handlers.rs to reduce code duplication Extract common request handling logic into dedicated modules: - handler_config.rs: Usage parser configurations for each API type - handler_context.rs: Request lifecycle context management - response_processor.rs: Unified streaming/non-streaming response handling Reduces handlers.rs from ~1130 lines to ~418 lines (-63%), eliminating repeated initialization and response processing patterns across the four API handlers (Claude, Codex Chat, Codex Responses, Gemini).	2025-12-11 23:22:05 +08:00

12 Commits