opboxDocs
Sign inBook a demo
DocsSecurity & LimitsAI - Foundations

AI Security & Limits

Opbox treats AI as an authenticated user that must obey the same rules as a human plus stricter ones for autonomous operation. Six independent layers enforce safety.

Layer 1: Injection Scanning

Tool results are user-attacker territory. A row in a database, a comment field, an OCR'd document - any of these can carry prompt injection that reshapes the model's behaviour on the next turn.

scanForInjection() runs over every tool result before it re-enters model context. Detected injections are sanitised in place. The original is kept for audit; the sanitised version is what the model sees.

TriggerWhat happens
Pattern detected (e.g. [INST], ChatML role markers, "ignore previous", role-shift tokens)Sanitise + log to SecurityEvent (AI_INJECTION_ATTEMPT)
No patternPass through unchanged

Coverage: chat route, agent worker, task triggers, MCP bridge. All call sites use the same scanner.

Admins can review injection attempts at Settings > Security > Events (filter by AI_INJECTION_ATTEMPT). The endpoint is /api/admin/security-events?type=AI_INJECTION_ATTEMPT (OWNER role required).

Layer 2: Rate Limits

Two budgets per user:

BudgetScopeWhen it resets
Session write limitOne chat threadNew thread
Daily write limitAll threads, one userCalendar day

Both apply to mutating tool calls. Read-only tools (search, list_*, get_*) are unmetered. Plan Mode skips the daily cap because it can't write anyway.

Failures surface as a rate_limit event in the SSE stream and trip a SecurityEvent log row. The chat continues but write tools are unavailable until the budget resets.

Layer 3: Plan Mode

A read-only mode for the assistant. When the user toggles the Plan pill below the chat input box (or the API caller sends planMode: true):

  • Write tools are removed from the registry handed to the model.
  • The model literally cannot call them - not even after a jailbreak, because the function is absent from its toolset.
  • The daily write budget is skipped.

Use case: "explore my data, plan a fix, but don't change anything yet."

Layer 4: Confirm-Then-Act

For every write operation outside Plan Mode, the AI calls ask_confirmation first. The chat UI renders an Approve / Deny dialog. The agentic loop pauses until the user responds.

This replaces an older apply: prefix convention. The new pattern is:

  1. AI plans the write.
  2. AI calls ask_confirmation({ message, actionDescription }).
  3. UI shows the dialog. User clicks Approve or Deny.
  4. AI receives the response. On Approve, it executes the write. On Deny, it acknowledges and stops.

ask_question is the read-only sibling: the AI requests a clarifying answer, the UI renders an inline input with optional quick-pick options.

Archive operations are not exposed to the AI at all - archive_matter, archive_table, etc. are human-only actions in the UI.

Layer 5: Autonomy Levels (MCP)

When the AI runs through the MCP bridge - including the Agent Bridge spawning Claude Code - the API key carries an autonomyLevel (0-3) set at creation. The MCP router enforces it before dispatching any tool call.

LevelMeaningExamples
L0Read-onlysearch, list_members, get_notifications, agent_queue_list
L1Read + internal writescreate_invite, mark_notifications_read, agent_queue_claim, doc-gen reads
L2Read + mutations with external side effectscall_api_endpoint (GET only), execute_sql (read-only), run_saved_query
L3FullAll registered tools, subject to per-tool gate flags

Tool registration declares the required level; the MCP router denies below-threshold calls and logs to the audit log.

Default for new keys is L0. Admins must explicitly raise the level when creating the key, and only OWNER / ADMIN can mint keys (POST /api/agent/api-keys).

Full breakdown in Autonomy Levels.

Layer 6: Cloudflare Access (OpenClaw provider)

When the OpenClaw provider is configured for an organisation, opbox can attach a Cloudflare Access service-token pair to every gateway call. The OpenClaw runtime stays loopback-only; opbox is the only client allowed to reach it, gated at Cloudflare's edge before traffic ever touches the host.

HeaderEffect
CF-Access-Client-IdIdentifies opbox as an authorised client.
CF-Access-Client-SecretAuthenticates that client.

Both fields must be set together (validated at the API write boundary; setting only one returns 400 CF_ACCESS_HALF_CONFIGURED). Both encrypted at rest via the same AES-256-GCM pipeline as the bearer token.

For local dev where the OpenClaw gateway is reachable directly (typically http://localhost:18789), both CF Access fields are left blank.

Layer 7: Server-Side Hardening

ControlDescription
Cell lockingLocked cells (__lock__-prefixed columns) are stripped from any update payload server-side. AI patch_row / bulk_update_rows and direct API PATCH all enforce this. Direct Prisma writes and CSV imports bypass locks.
API endpoint accessThe call_api_endpoint AI tool is GET-only with 14 blocked path prefixes (auth, admin, billing, etc.) and a route-catalogue allowlist check.
SQL queriesexecute_sql is read-only, sandboxed, workspace-scoped, with query timeouts and row limits. Requires ADMIN role.
Workflow code stepsThree-layer sandbox: regex blocklist on the source, null-prototype globals, and global shadowing of dangerous APIs.
Markdown renderingsanitizeUrl() enforces a protocol allowlist. Document embeds run through DOMPurify.
Content truncationDocument content and tool results are truncated to safe lengths before being placed in model context.

Cost Gate

checkBudget() runs before every LLM call. If the workspace is over its hard cap, the call is rejected with a BUDGET_EXCEEDED error in the SSE stream.

SettingPurpose
monthlyTokenCap (per-tier)Informational soft ceiling shown in Settings > AI.
Hard cap (per-workspace)Enforced by checkBudget(). Configured in Cost Control.
Soft limit %Optional warning threshold (default 80%). Logs but doesn't block.
Fail-open on errorsCost recording errors never block calls - availability over consistency.

See Cost Control for the full ledger model.

What the AI Cannot Do

A short list, by design:

  • Archive resources (archive_matter, archive_table, ...) - human-only in the UI.
  • Send POST/PUT/DELETE through call_api_endpoint - GET only.
  • Mutate locked cells - stripped server-side.
  • Run arbitrary SQL without ADMIN role; even with it, only read-only.
  • Read another workspace's data unless the calling key passes the cross-tenant authority check.
  • Bypass autonomy thresholds - tool registration controls visibility per level.
  • Skip the cost ledger - every model call writes one row.
  • Skip injection scanning - every tool result re-enters model context through the scanner.

Security Events

SecurityEvent rows are created by automatic monitoring across the stack. Surfaced at Settings > Security > Events.

Event TypeTrigger
AI_INJECTION_ATTEMPTscanForInjection() flagged a tool result
AI_RATE_LIMITSession or daily write budget exceeded
CSRF_VIOLATIONMutation request without a valid CSRF token
LOGIN_FAILUREAuth attempt rejected (subject to lockout policies)
RATE_LIMITGeneric per-endpoint rate limit hit

Pagination + filter via GET /api/admin/security-events?type=...&cursor=...&limit=... (OWNER only).

See Also

We use cookies

Strictly necessary cookies keep you signed in and protect requests. We also use optional cookies for preferences and (when enabled) analytics. Learn more.