Voice Transcripts
Upload audio files (voice memos, meeting recordings, dictation) and Opbox transcribes them to text. Two providers: cloud Whisper (OpenAI) for plain text, or local WhisperX for speaker diarisation with timestamped segments.
Transcripts are private per user - only the uploader can see and manage their own transcripts. The AI assistant respects this through dedicated list_my_transcripts / get_my_transcript tools.
Providers
| Provider | Speaker Diarisation | Requirements | Output |
|---|---|---|---|
| Whisper (cloud) | No - plain text only | OpenAI API key | Single text blob |
| WhisperX (local) | Yes - timestamped segments with speaker labels | Python 3, pip install whisperx, HuggingFace token (HF_TOKEN), WHISPERX_ENABLED=true | Text + segments array |
Provider selection in Settings > AI > Transcription:
- Auto - WhisperX if installed and enabled, else Whisper.
- WhisperX - force local diarised transcription. Errors if not configured.
- Whisper - force cloud Whisper.
Upload & Transcribe
POST /api/ai/transcribe
Content-Type: multipart/form-data
file: <audio file (max 1GB)>
title: "Lee & Will" # optional
provider: "whisperx" # optional - overrides user pref
Supported formats: .m4a, .mp3, .wav, .webm, .ogg, .flac, .mp4, .qta (Apple Voice Memos - auto-converted to m4a).
The UI shows a naming dialog on file selection. The title defaults to the filename (minus extension) and is used as the Knowledge Base document name with a date prefix (e.g. "18 February 2026 - Lee & Will").
WhisperX Response
{
"id": "cm...",
"filename": "meeting.m4a",
"title": "Lee & Will",
"text": "Hello everyone, let's get started...",
"language": "en",
"durationSecs": 120.5,
"provider": "whisperx",
"speakerCount": 3,
"segments": [
{ "start": 0.5, "end": 3.2, "text": "Hello everyone, let's get started.", "speaker": "SPEAKER_00" },
{ "start": 3.8, "end": 7.1, "text": "Thanks for joining.", "speaker": "SPEAKER_01" }
],
"createdAt": "2026-02-18T10:30:00.000Z",
"documentId": "cm..."
}
Whisper Response
{
"id": "cm...",
"filename": "recording.m4a",
"title": null,
"text": "Hello, this is a transcription of...",
"language": "english",
"durationSecs": 42.5,
"provider": "openai",
"speakerCount": null,
"segments": null,
"createdAt": "2026-02-18T10:30:00.000Z",
"documentId": "cm..."
}
List, Get, Delete
GET /api/ai/transcripts?search=meeting&page=1&limit=20
GET /api/ai/transcripts/:id
DELETE /api/ai/transcripts/:id
The list endpoint is scoped to the requesting user - you only ever see your own transcripts.
Knowledge Base Sync
Each transcript creates a paired KB document under the Transcripts system folder. The KB document carries:
- The transcript text (or rich segments for WhisperX, with speaker labels).
- A title formatted as
"DD Month YYYY - Title"(e.g. "18 February 2026 - Lee & Will"). - A
transcriptlabel so you can filter the KB by it.
This means transcripts are immediately searchable via knowledge_search and the standard KB tools - useful when the AI is reasoning across notes, documents, and meeting transcripts together.
Response Field Reference
| Field | Type | Description |
|---|---|---|
title | string | null | User-chosen display name. Null if not provided; falls back to filename. |
provider | string | null | "openai" or "whisperx". Null for legacy transcripts. |
speakerCount | number | null | Distinct speakers detected. WhisperX only. |
segments | array | null | Each segment carries start, end, text, speaker with timestamps in seconds. WhisperX only. |
documentId | string | Paired KB document for searchability. |
Privacy Model
- Per-user scoping - the list/detail/delete endpoints all filter by the requesting user.
- AI access - the AI cannot use generic
list_transcripts/get_transcripttools. It only haslist_my_transcripts/get_my_transcriptwhich enforce the same per-user scoping when the agent runs as a specific user (rare for transcripts). - Workspace owners - cannot read your transcripts via the UI or API. They can only see metadata in the audit log (filename, duration, timestamp) - never content.
See Also
- Saved Prompts - the sibling per-user KB folder.
- AI Assistant - chat over transcripts via
knowledge_search.