AI Security & Limits

Opbox treats AI as an authenticated user that must obey the same rules as a human plus stricter ones for autonomous operation. Six independent layers enforce safety.

Layer 1: Injection Scanning

Tool results are user-attacker territory. A row in a database, a comment field, an OCR'd document - any of these can carry prompt injection that reshapes the model's behaviour on the next turn.

scanForInjection() runs over every tool result before it re-enters model context. Detected injections are sanitised in place. The original is kept for audit; the sanitised version is what the model sees.

Trigger	What happens
Pattern detected (e.g. `[INST]`, ChatML role markers, "ignore previous", role-shift tokens)	Sanitise + log to `SecurityEvent` (`AI_INJECTION_ATTEMPT`)
No pattern	Pass through unchanged

Coverage: chat route, agent worker, task triggers, MCP bridge. All call sites use the same scanner.

Admins can review injection attempts at Settings > Security > Events (filter by AI_INJECTION_ATTEMPT). The endpoint is /api/admin/security-events?type=AI_INJECTION_ATTEMPT (OWNER role required).

Layer 2: Rate Limits

Two budgets per user:

Budget	Scope	When it resets
Session write limit	One chat thread	New thread
Daily write limit	All threads, one user	Calendar day

Both apply to mutating tool calls. Read-only tools (search, list_*, get_*) are unmetered. Plan Mode skips the daily cap because it can't write anyway.

Failures surface as a rate_limit event in the SSE stream and trip a SecurityEvent log row. The chat continues but write tools are unavailable until the budget resets.

Layer 3: Plan Mode

A read-only mode for the assistant. When the user toggles the Plan pill below the chat input box (or the API caller sends planMode: true):

Write tools are removed from the registry handed to the model.
The model literally cannot call them - not even after a jailbreak, because the function is absent from its toolset.
The daily write budget is skipped.

Use case: "explore my data, plan a fix, but don't change anything yet."

Layer 4: Confirm-Then-Act

For every write operation outside Plan Mode, the AI calls ask_confirmation first. The chat UI renders an Approve / Deny dialog. The agentic loop pauses until the user responds.

This replaces an older apply: prefix convention. The new pattern is:

AI plans the write.
AI calls ask_confirmation({ message, actionDescription }).
UI shows the dialog. User clicks Approve or Deny.
AI receives the response. On Approve, it executes the write. On Deny, it acknowledges and stops.

ask_question is the read-only sibling: the AI requests a clarifying answer, the UI renders an inline input with optional quick-pick options.

Archive operations are not exposed to the AI at all - archive_matter, archive_table, etc. are human-only actions in the UI.

Layer 5: Autonomy Levels (MCP)

When the AI runs through the MCP bridge - including the Agent Bridge spawning Claude Code - the API key carries an autonomyLevel (0-3) set at creation. The MCP router enforces it before dispatching any tool call.

Level	Meaning	Examples
L0	Read-only	`search`, `list_members`, `get_notifications`, `agent_queue_list`
L1	Read + internal writes	`create_invite`, `mark_notifications_read`, `agent_queue_claim`, doc-gen reads
L2	Read + mutations with external side effects	`call_api_endpoint` (GET only), `execute_sql` (read-only), `run_saved_query`
L3	Full	All registered tools, subject to per-tool gate flags

Tool registration declares the required level; the MCP router denies below-threshold calls and logs to the audit log.

Default for new keys is L0. Admins must explicitly raise the level when creating the key, and only OWNER / ADMIN can mint keys (POST /api/agent/api-keys).

Full breakdown in Autonomy Levels.

Layer 6: Cloudflare Access (OpenClaw provider)

When the OpenClaw provider is configured for an organisation, opbox can attach a Cloudflare Access service-token pair to every gateway call. The OpenClaw runtime stays loopback-only; opbox is the only client allowed to reach it, gated at Cloudflare's edge before traffic ever touches the host.

Header	Effect
`CF-Access-Client-Id`	Identifies opbox as an authorised client.
`CF-Access-Client-Secret`	Authenticates that client.

Both fields must be set together (validated at the API write boundary; setting only one returns 400 CF_ACCESS_HALF_CONFIGURED). Both encrypted at rest via the same AES-256-GCM pipeline as the bearer token.

For local dev where the OpenClaw gateway is reachable directly (typically http://localhost:18789), both CF Access fields are left blank.

Layer 7: Server-Side Hardening

Control	Description
Cell locking	Locked cells (`__lock__`-prefixed columns) are stripped from any update payload server-side. AI `patch_row` / `bulk_update_rows` and direct API PATCH all enforce this. Direct Prisma writes and CSV imports bypass locks.
API endpoint access	The `call_api_endpoint` AI tool is GET-only with 14 blocked path prefixes (auth, admin, billing, etc.) and a route-catalogue allowlist check.
SQL queries	`execute_sql` is read-only, sandboxed, workspace-scoped, with query timeouts and row limits. Requires ADMIN role.
Workflow code steps	Three-layer sandbox: regex blocklist on the source, null-prototype globals, and global shadowing of dangerous APIs.
Markdown rendering	`sanitizeUrl()` enforces a protocol allowlist. Document embeds run through DOMPurify.
Content truncation	Document content and tool results are truncated to safe lengths before being placed in model context.

Cost Gate

checkBudget() runs before every LLM call. If the workspace is over its hard cap, the call is rejected with a BUDGET_EXCEEDED error in the SSE stream.

Setting	Purpose
`monthlyTokenCap` (per-tier)	Informational soft ceiling shown in Settings > AI.
Hard cap (per-workspace)	Enforced by `checkBudget()`. Configured in Cost Control.
Soft limit %	Optional warning threshold (default 80%). Logs but doesn't block.
Fail-open on errors	Cost recording errors never block calls - availability over consistency.

See Cost Control for the full ledger model.

What the AI Cannot Do

A short list, by design:

Archive resources (archive_matter, archive_table, ...) - human-only in the UI.
Send POST/PUT/DELETE through call_api_endpoint - GET only.
Mutate locked cells - stripped server-side.
Run arbitrary SQL without ADMIN role; even with it, only read-only.
Read another workspace's data unless the calling key passes the cross-tenant authority check.
Bypass autonomy thresholds - tool registration controls visibility per level.
Skip the cost ledger - every model call writes one row.
Skip injection scanning - every tool result re-enters model context through the scanner.

Security Events

SecurityEvent rows are created by automatic monitoring across the stack. Surfaced at Settings > Security > Events.

Event Type	Trigger
`AI_INJECTION_ATTEMPT`	`scanForInjection()` flagged a tool result
`AI_RATE_LIMIT`	Session or daily write budget exceeded
`CSRF_VIOLATION`	Mutation request without a valid CSRF token
`LOGIN_FAILURE`	Auth attempt rejected (subject to lockout policies)
`RATE_LIMIT`	Generic per-endpoint rate limit hit

Pagination + filter via GET /api/admin/security-events?type=...&cursor=...&limit=... (OWNER only).