opboxDocs
Sign inBook a demo
DocsCost LedgerAI - Reference

Cost Control

Every LLM call in Opbox writes one ledger row. Every workspace has a hard cap. Every spend is attributed to the credential tier that paid for it. The full system is built on three primitives: the AiCostLedger, the AiBudgetSnapshot, and the checkBudget() gate.

The Ledger

AiCostLedger is an append-only row per LLM call.

FieldPurpose
workspaceIdSpend bucket. Every row has a workspace.
operationTypechat / agent / extraction / embedding / other
modelSpecific model that ran (e.g. claude-sonnet-4-5-20250929)
inputTokens / outputTokensRaw token counts from the provider response
actualCostUsdComputed from per-model rates in AI_COST_RATES
keySourceWhich credential tier paid: USER_KEY / WORKSPACE_KEY / ORG_KEY / SERVER_KEY
userIdThe user whose action triggered the call (for chat); null for autonomous flows
agentTaskIdSet when the call was made by the agent worker
createdAtTimestamp

Writes are transactional: the ledger entry and the monthly AiBudgetSnapshot upsert run in one atomic database transaction. Either both succeed or neither does.

The Snapshot

AiBudgetSnapshot is a denormalised running total per (workspaceId, month). Fields:

  • totalInputTokens, totalOutputTokens, totalCostUsd
  • monthStart (YYYY-MM-01)

The snapshot exists so that checkBudget() can answer "are we over the cap?" with one row read instead of an aggregate over the ledger. The transactional write keeps the snapshot in sync with the ledger automatically.

Drift Reconciliation

The snapshot can drift from the ledger if:

  • A failed transaction left a partial state (rare, the transactional write makes this near-impossible).
  • A historical migration adjusted ledger rows.
  • An admin manually edited rows.

reconcileBudgetSnapshot() walks the ledger for the period and rewrites the snapshot. Run nightly via cron or on-demand from Settings > AI > Cost Control.

Budget Gate

checkBudget(workspaceId, estimatedCostUsd) runs before every LLM call. It reads the current month's snapshot, adds the estimate, and rejects if the total would exceed the workspace's hard cap.

CheckBehaviour
Hard capFail-closed - call is rejected with BUDGET_EXCEEDED.
Soft limit (configurable %, default 80%)Logs but doesn't block - emits a warning event.
Cost recording itselfFail-open - if writing the ledger row errors, the call still proceeds. Availability over consistency for recording; consistency for enforcement.

The asymmetry is deliberate: a transient DB hiccup shouldn't block a customer's chat session, but a clear over-cap state should.

Key Source Attribution

The keySource column is the heart of attribution. The BYOK resolver returns it; the ledger writes it; the breakdown reads it.

TierKeySourceWhat it means for billing
User (Personal BYOK)USER_KEYThe user's own provider key paid. Bypasses budget gate by default - personal spend isn't on the org's cap.
WorkspaceWORKSPACE_KEYThe workspace's override key paid. Counts against the workspace cap.
OrgORG_KEYThe org's primary key paid. Counts against the workspace cap.
ServerSERVER_KEYA server-level env-var key paid. Self-hosted / dev.

The "USER_KEY bypasses the gate" rule means: a user with their own personal key can keep running calls even if the workspace is over its cap. The org's cap is for spend the org pays for. If a user is paying their own way, the org's cap doesn't apply.

This rule is also why the allowPersonalKeys toggle matters. An org that needs hard guarantees on the cap will turn personal keys off so every call funnels through tiers that do hit the cap.

Spend Breakdown

The spend breakdown endpoint pivots the ledger two ways for a given period - by operation (chat, agent, extraction, embedding, other) and by key source (USER_KEY, WORKSPACE_KEY, ORG_KEY, SERVER_KEY):

{
  "byOperation": {
    "chat":       { "tokens": 42000, "costUsd": 0.45 },
    "agent":      { "tokens": 12000, "costUsd": 0.18 },
    "extraction": { "tokens":  3200, "costUsd": 0.04 },
    "embedding":  { "tokens":   800, "costUsd": 0.01 },
    "other":      { "tokens":     0, "costUsd": 0.00 }
  },
  "byKeySource": {
    "USER_KEY":      { "tokens":  4000, "costUsd": 0.04 },
    "WORKSPACE_KEY": { "tokens": 36000, "costUsd": 0.42 },
    "ORG_KEY":       { "tokens": 18000, "costUsd": 0.22 },
    "SERVER_KEY":    { "tokens":     0, "costUsd": 0.00 }
  }
}

Surfaced via:

  • GET /api/settings/ai-usage?view=summary
  • Settings > AI > Cost Control page - rendered as two breakdown rows ("By feature" + "Attributed to").

Daily Usage

A daily-usage endpoint returns a per-day token + cost series for charting, and a top-users-by-spend report lists the heaviest spenders for admin UX.

Per-Model Rates

A per-model rate table maps each supported model to its per-token pricing. For example, claude-sonnet-4-5-20250929 is billed at $3.00 per million input tokens and $15.00 per million output tokens. Cost-per-call is computed from these rates plus the input/output token counts the provider returns.

A regression test guards against silent under-billing: the resolver's default models must always be present in the rate table. Without this guard, a workspace with no model override would fall back to a default whose cost was $0 - a silent free ride.

Configuration

Hard cap and soft limit are configured per workspace at Settings > AI > Cost Control:

SettingPurpose
Hard cap (USD/month)Fail-closed budget gate. Default: unset (no cap).
Soft limit %Warning threshold (default 80%). Logs but doesn't block.
Reconcile driftManual button to run reconcileBudgetSnapshot() for the current month.

The informational monthlyTokenCap on the BYOK config tiers is not the same thing - it's purely cosmetic and not enforced. The hard cap above is what actually gates calls.

Anthropic prompt caching

Opbox sends the AI tool catalog and the stable portion of the system prompt with cache_control: { type: 'ephemeral' }, which tells Anthropic to cache that prefix for 5 minutes. Subsequent requests within the window pay 0.10x the normal input-token rate for the cached portion.

In practice, ~95% of input tokens are cache reads on a workspace with active chat use. The savings show up automatically in actualCostUsd because the per-call price already accounts for cache reads vs. cache creation.

The ledger captures the breakdown in metadata:

Metadata fieldMeaning
cacheCreationInputTokensTokens billed at the cache-creation rate (1.25x normal input). Paid once per cache write.
cacheReadInputTokensTokens billed at the cache-read rate (0.10x normal input). Paid on every cache hit.

Cache hit rate (target: >60%) can be computed over any window:

SELECT
  SUM((metadata->>'cacheReadInputTokens')::bigint) AS cache_reads,
  SUM((metadata->>'cacheCreationInputTokens')::bigint) AS cache_creations,
  SUM((metadata->>'cacheReadInputTokens')::bigint)::float
    / NULLIF(SUM((metadata->>'cacheReadInputTokens')::bigint)
             + SUM((metadata->>'cacheCreationInputTokens')::bigint), 0) AS hit_rate
FROM ai_cost_ledger
WHERE created_at >= NOW() - INTERVAL '7 days'
  AND metadata->>'cacheReadInputTokens' IS NOT NULL;

The volatile portion of the system prompt (per-request page context, current matter, current document) is intentionally NOT cached so it doesn't pollute the prefix.

See Also

We use cookies

Strictly necessary cookies keep you signed in and protect requests. We also use optional cookies for preferences and (when enabled) analytics. Learn more.