AI Security & Limits
Opbox treats AI as an authenticated user that must obey the same rules as a human plus stricter ones for autonomous operation. Six independent layers enforce safety.
Layer 1: Injection Scanning
Tool results are user-attacker territory. A row in a database, a comment field, an OCR'd document - any of these can carry prompt injection that reshapes the model's behaviour on the next turn.
scanForInjection() runs over every tool result before it re-enters model context. Detected injections are sanitised in place. The original is kept for audit; the sanitised version is what the model sees.
| Trigger | What happens |
|---|---|
Pattern detected (e.g. [INST], ChatML role markers, "ignore previous", role-shift tokens) | Sanitise + log to SecurityEvent (AI_INJECTION_ATTEMPT) |
| No pattern | Pass through unchanged |
Coverage: chat route, agent worker, task triggers, MCP bridge. All call sites use the same scanner.
Admins can review injection attempts at Settings > Security > Events (filter by AI_INJECTION_ATTEMPT). The endpoint is /api/admin/security-events?type=AI_INJECTION_ATTEMPT (OWNER role required).
Layer 2: Rate Limits
Two budgets per user:
| Budget | Scope | When it resets |
|---|---|---|
| Session write limit | One chat thread | New thread |
| Daily write limit | All threads, one user | Calendar day |
Both apply to mutating tool calls. Read-only tools (search, list_*, get_*) are unmetered. Plan Mode skips the daily cap because it can't write anyway.
Failures surface as a rate_limit event in the SSE stream and trip a SecurityEvent log row. The chat continues but write tools are unavailable until the budget resets.
Layer 3: Plan Mode
A read-only mode for the assistant. When the user toggles the Plan pill below the chat input box (or the API caller sends planMode: true):
- Write tools are removed from the registry handed to the model.
- The model literally cannot call them - not even after a jailbreak, because the function is absent from its toolset.
- The daily write budget is skipped.
Use case: "explore my data, plan a fix, but don't change anything yet."
Layer 4: Confirm-Then-Act
For every write operation outside Plan Mode, the AI calls ask_confirmation first. The chat UI renders an Approve / Deny dialog. The agentic loop pauses until the user responds.
This replaces an older apply: prefix convention. The new pattern is:
- AI plans the write.
- AI calls
ask_confirmation({ message, actionDescription }). - UI shows the dialog. User clicks Approve or Deny.
- AI receives the response. On Approve, it executes the write. On Deny, it acknowledges and stops.
ask_question is the read-only sibling: the AI requests a clarifying answer, the UI renders an inline input with optional quick-pick options.
Archive operations are not exposed to the AI at all - archive_matter, archive_table, etc. are human-only actions in the UI.
Layer 5: Autonomy Levels (MCP)
When the AI runs through the MCP bridge - including the Agent Bridge spawning Claude Code - the API key carries an autonomyLevel (0-3) set at creation. The MCP router enforces it before dispatching any tool call.
| Level | Meaning | Examples |
|---|---|---|
| L0 | Read-only | search, list_members, get_notifications, agent_queue_list |
| L1 | Read + internal writes | create_invite, mark_notifications_read, agent_queue_claim, doc-gen reads |
| L2 | Read + mutations with external side effects | call_api_endpoint (GET only), execute_sql (read-only), run_saved_query |
| L3 | Full | All registered tools, subject to per-tool gate flags |
Tool registration declares the required level; the MCP router denies below-threshold calls and logs to the audit log.
Default for new keys is L0. Admins must explicitly raise the level when creating the key, and only OWNER / ADMIN can mint keys (POST /api/agent/api-keys).
Full breakdown in Autonomy Levels.
Layer 6: Cloudflare Access (OpenClaw provider)
When the OpenClaw provider is configured for an organisation, opbox can attach a Cloudflare Access service-token pair to every gateway call. The OpenClaw runtime stays loopback-only; opbox is the only client allowed to reach it, gated at Cloudflare's edge before traffic ever touches the host.
| Header | Effect |
|---|---|
CF-Access-Client-Id | Identifies opbox as an authorised client. |
CF-Access-Client-Secret | Authenticates that client. |
Both fields must be set together (validated at the API write boundary; setting only one returns 400 CF_ACCESS_HALF_CONFIGURED). Both encrypted at rest via the same AES-256-GCM pipeline as the bearer token.
For local dev where the OpenClaw gateway is reachable directly (typically http://localhost:18789), both CF Access fields are left blank.
Layer 7: Server-Side Hardening
| Control | Description |
|---|---|
| Cell locking | Locked cells (__lock__-prefixed columns) are stripped from any update payload server-side. AI patch_row / bulk_update_rows and direct API PATCH all enforce this. Direct Prisma writes and CSV imports bypass locks. |
| API endpoint access | The call_api_endpoint AI tool is GET-only with 14 blocked path prefixes (auth, admin, billing, etc.) and a route-catalogue allowlist check. |
| SQL queries | execute_sql is read-only, sandboxed, workspace-scoped, with query timeouts and row limits. Requires ADMIN role. |
| Workflow code steps | Three-layer sandbox: regex blocklist on the source, null-prototype globals, and global shadowing of dangerous APIs. |
| Markdown rendering | sanitizeUrl() enforces a protocol allowlist. Document embeds run through DOMPurify. |
| Content truncation | Document content and tool results are truncated to safe lengths before being placed in model context. |
Cost Gate
checkBudget() runs before every LLM call. If the workspace is over its hard cap, the call is rejected with a BUDGET_EXCEEDED error in the SSE stream.
| Setting | Purpose |
|---|---|
monthlyTokenCap (per-tier) | Informational soft ceiling shown in Settings > AI. |
| Hard cap (per-workspace) | Enforced by checkBudget(). Configured in Cost Control. |
| Soft limit % | Optional warning threshold (default 80%). Logs but doesn't block. |
| Fail-open on errors | Cost recording errors never block calls - availability over consistency. |
See Cost Control for the full ledger model.
What the AI Cannot Do
A short list, by design:
- Archive resources (
archive_matter,archive_table, ...) - human-only in the UI. - Send POST/PUT/DELETE through
call_api_endpoint- GET only. - Mutate locked cells - stripped server-side.
- Run arbitrary SQL without ADMIN role; even with it, only read-only.
- Read another workspace's data unless the calling key passes the cross-tenant authority check.
- Bypass autonomy thresholds - tool registration controls visibility per level.
- Skip the cost ledger - every model call writes one row.
- Skip injection scanning - every tool result re-enters model context through the scanner.
Security Events
SecurityEvent rows are created by automatic monitoring across the stack. Surfaced at Settings > Security > Events.
| Event Type | Trigger |
|---|---|
AI_INJECTION_ATTEMPT | scanForInjection() flagged a tool result |
AI_RATE_LIMIT | Session or daily write budget exceeded |
CSRF_VIOLATION | Mutation request without a valid CSRF token |
LOGIN_FAILURE | Auth attempt rejected (subject to lockout policies) |
RATE_LIMIT | Generic per-endpoint rate limit hit |
Pagination + filter via GET /api/admin/security-events?type=...&cursor=...&limit=... (OWNER only).
See Also
- Autonomy Levels - per-level tool access matrix.
- Cost Control - ledger, budgets, attribution.
- BYOK Credentials - the resolver chain.