Pricing
Usertoken costs per model, estimation functions, and the fallback rate.
All costs in the usertrust SDK are denominated in usertokens (UT).
1 UT = $0.0001 (one basis point of a cent).
A budget of 100_000 UT equals $10.00. A budget of 1_000_000 UT equals $100.00.
Pricing Table
The SDK includes a 20-model pricing table in ledger/pricing.ts. Rates are in usertokens per 1,000 LLM tokens.
Anthropic
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
claude-sonnet-4-6 | 30 | 150 |
claude-haiku-4-5 | 10 | 50 |
claude-opus-4-6 | 50 | 250 |
OpenAI
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
gpt-4o | 25 | 100 |
gpt-4o-mini | 1.5 | 6 |
gpt-5.4 | 25 | 150 |
o3 | 20 | 80 |
o4-mini | 5.5 | 22 |
Google Gemini
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
gemini-2.5-flash | 3 | 25 |
gemini-2.5-pro | 12.5 | 100 |
gemini-3.1-pro | 20 | 120 |
Other Providers
| Model | Provider | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|---|
mistral-large | Mistral | 5 | 15 |
deepseek-chat | DeepSeek | 2.8 | 4.2 |
deepseek-reasoner | DeepSeek | 2.8 | 4.2 |
grok-3 | xAI | 30 | 150 |
llama-4-maverick | Meta (Bedrock) | 2.4 | 9.7 |
command-a | Cohere | 25 | 100 |
sonar-pro | Perplexity | 30 | 150 |
qwen-72b | Alibaba | 2.9 | 3.9 |
nova-pro | Amazon | 8 | 32 |
Rates reflect the SDK's built-in table. Check packages/core/src/ledger/pricing.ts for the canonical source.
Fallback Rate
Models not in the pricing table use sonnet-class pricing as a conservative fallback:
| Direction | Rate (per 1K tokens) |
|---|---|
| Input | 30 UT |
| Output | 150 UT |
This ensures unknown models are never free. The fallback is intentionally high to avoid under-billing.
Model Matching
getModelRates() first attempts an exact match against the pricing table. If no exact match is found, it tries prefix matching (longest key first) to handle versioned model names. If neither matches, the fallback rate is used.
getModelRates("claude-sonnet-4-6") // exact match → { 30, 150 }
getModelRates("claude-sonnet-4-6-20260301") // prefix match → { 30, 150 }
getModelRates("unknown-model") // fallback → { 30, 150 }Functions
getModelRates()
function getModelRates(model: string): ModelRatesLook up the input and output rates for a model. Falls back to prefix matching, then to the fallback rate.
estimateCost()
function estimateCost(model: string, inputTokens: number, outputTokens: number): numberCalculate the cost in usertokens for a model call given actual token counts. The result is always at least 1 (floor to prevent zero-amount transfers in TigerBeetle).
// 1,000 input tokens + 500 output tokens on claude-sonnet-4-6
estimateCost("claude-sonnet-4-6", 1000, 500)
// → Math.max(1, Math.ceil((1000/1000 * 30) + (500/1000 * 150)))
// → Math.max(1, Math.ceil(30 + 75))
// → 105 UTestimateInputTokens()
function estimateInputTokens(messages: unknown[]): numberEstimate the input token count from a messages array before the LLM call. Uses a heuristic of approximately 4 characters per token with a 1.5x safety margin so the PENDING hold exceeds actual cost in the majority of cases.
This function handles:
- String content and multi-part content blocks
- Nested arrays (tool result payloads)
- Per-message overhead (role, structure)
- Tool-call overhead
Returns at least 1.
ModelRates
interface ModelRates {
inputPer1k: number; // usertokens per 1,000 input tokens
outputPer1k: number; // usertokens per 1,000 output tokens
}Cost Estimation Flow
During the two-phase spend lifecycle, costs are estimated twice:
-
Before the call (PENDING):
estimateInputTokens()produces a conservative input estimate. Combined withmax_tokensfor the output estimate,estimateCost()calculates the hold amount. The 1.5x safety margin means the hold usually exceeds actual cost. -
After the call (POST): Actual token counts from the provider response (or accumulated from stream chunks) produce the real cost. The difference between the hold and the actual cost is released back to the available budget.