Skip to content

fix(usage): correct inflated input_tokens and Zhipu quota routing#264

Open
thedavidweng wants to merge 3 commits into
SaladDay:mainfrom
thedavidweng:fix/259-usage-input-tokens-and-zhipu-quota
Open

fix(usage): correct inflated input_tokens and Zhipu quota routing#264
thedavidweng wants to merge 3 commits into
SaladDay:mainfrom
thedavidweng:fix/259-usage-input-tokens-and-zhipu-quota

Conversation

@thedavidweng

Copy link
Copy Markdown
Contributor

Summary

1. Inflated input_tokens in Claude stream parsing

Some Anthropic-compatible SSE providers (e.g. Qwen, MiniMax) report the full context (fresh + cached) as input_tokens in message_start, double-counting the cached portion. The message_delta handler only adopted input_tokens when the start value was exactly zero, so the correct (smaller) delta value was silently discarded.

Fix: When message_delta provides a positive input_tokens smaller than the current value, prefer it and sync cache counts from the same delta block.

2. Zhipu quota query hardcoded to international endpoint

query_zhipu() always hit https://api.z.ai, making quota queries fail for users on the mainland China preset (open.bigmodel.cn).

Fix: Route by the configured base_urlopen.bigmodel.cn users query open.bigmodel.cn, others query api.z.ai.

3. Zhipu tier sorting

When the 5-hour bucket is at 0% utilization, the API omits nextResetTime. Without sorting, the weekly bucket could incorrectly claim the five-hour slot.

Fix: Sort tiers with missing nextResetTime first (just-reset 5-hour bucket), then by reset time ascending.

Changes

  • src-tauri/src/proxy/usage/parser.rs — update from_claude_stream_events delta handling
  • src-tauri/src/services/coding_plan.rs — add zhipu_quota_base helper, update query_zhipu signature, add tier sorting

Closes #259

@thedavidweng thedavidweng marked this pull request as ready for review June 10, 2026 03:15
@SaladDay

Copy link
Copy Markdown
Owner

@thedavidweng Thanks for the focused fix. I found one small issue before this is ready: zhipu_quota_base() should use the same lowercasing behavior as detect_provider(). Right now an uppercase/custom-cased OPEN.BIGMODEL.CN URL is detected as Zhipu CN, but the quota request would still route to api.z.ai.

Could you also add small tests for the delta-input override case in the usage parser and the Zhipu quota base selection/sorting behavior? After that, this should be in good shape.

- Prefer message_delta input_tokens over message_start when the delta
  value is smaller, fixing double-counting on providers (Qwen, MiniMax)
  that report fresh+cached as input_tokens in message_start
- Sync cache counts from the same delta block when adopting its
  input_tokens to avoid stale cache_read/cache_creation values
- Route Zhipu quota queries to the endpoint matching the user's
  configured base_url instead of always hitting api.z.ai
- Sort Zhipu tiers by nextResetTime with missing values first, so
  the 5-hour bucket (just reset, 0% utilization) displays correctly

Closes SaladDay#259
- Apply .to_lowercase() in zhipu_quota_base() to match detect_provider()
  behavior, so uppercase URLs like OPEN.BIGMODEL.CN route correctly
- Add tests for delta-input override in Claude stream parsing
  (inflated start overridden by smaller delta, larger delta ignored,
  zero start always adopts delta)
- Add tests for zhipu_quota_base case-insensitivity and consistency
  with detect_provider
- Add tests for Zhipu tier sorting (None resets_at first, ascending)
@thedavidweng thedavidweng force-pushed the fix/259-usage-input-tokens-and-zhipu-quota branch from d456368 to 77db2f1 Compare June 10, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(usage): inflated input_tokens in Claude stream parsing and Zhipu quota issues

2 participants