opboxDocs
Sign inBook a demo
DocsVoice TranscriptsAI - Assistant

Voice Transcripts

Upload audio files (voice memos, meeting recordings, dictation) and Opbox transcribes them to text. Two providers: cloud Whisper (OpenAI) for plain text, or local WhisperX for speaker diarisation with timestamped segments.

Transcripts are private per user - only the uploader can see and manage their own transcripts. The AI assistant respects this through dedicated list_my_transcripts / get_my_transcript tools.

Providers

ProviderSpeaker DiarisationRequirementsOutput
Whisper (cloud)No - plain text onlyOpenAI API keySingle text blob
WhisperX (local)Yes - timestamped segments with speaker labelsPython 3, pip install whisperx, HuggingFace token (HF_TOKEN), WHISPERX_ENABLED=trueText + segments array

Provider selection in Settings > AI > Transcription:

  • Auto - WhisperX if installed and enabled, else Whisper.
  • WhisperX - force local diarised transcription. Errors if not configured.
  • Whisper - force cloud Whisper.

Upload & Transcribe

POST /api/ai/transcribe
Content-Type: multipart/form-data

file: <audio file (max 1GB)>
title: "Lee & Will"        # optional
provider: "whisperx"       # optional - overrides user pref

Supported formats: .m4a, .mp3, .wav, .webm, .ogg, .flac, .mp4, .qta (Apple Voice Memos - auto-converted to m4a).

The UI shows a naming dialog on file selection. The title defaults to the filename (minus extension) and is used as the Knowledge Base document name with a date prefix (e.g. "18 February 2026 - Lee & Will").

WhisperX Response

{
  "id": "cm...",
  "filename": "meeting.m4a",
  "title": "Lee & Will",
  "text": "Hello everyone, let's get started...",
  "language": "en",
  "durationSecs": 120.5,
  "provider": "whisperx",
  "speakerCount": 3,
  "segments": [
    { "start": 0.5, "end": 3.2, "text": "Hello everyone, let's get started.", "speaker": "SPEAKER_00" },
    { "start": 3.8, "end": 7.1, "text": "Thanks for joining.", "speaker": "SPEAKER_01" }
  ],
  "createdAt": "2026-02-18T10:30:00.000Z",
  "documentId": "cm..."
}

Whisper Response

{
  "id": "cm...",
  "filename": "recording.m4a",
  "title": null,
  "text": "Hello, this is a transcription of...",
  "language": "english",
  "durationSecs": 42.5,
  "provider": "openai",
  "speakerCount": null,
  "segments": null,
  "createdAt": "2026-02-18T10:30:00.000Z",
  "documentId": "cm..."
}

List, Get, Delete

GET /api/ai/transcripts?search=meeting&page=1&limit=20
GET /api/ai/transcripts/:id
DELETE /api/ai/transcripts/:id

The list endpoint is scoped to the requesting user - you only ever see your own transcripts.

Knowledge Base Sync

Each transcript creates a paired KB document under the Transcripts system folder. The KB document carries:

  • The transcript text (or rich segments for WhisperX, with speaker labels).
  • A title formatted as "DD Month YYYY - Title" (e.g. "18 February 2026 - Lee & Will").
  • A transcript label so you can filter the KB by it.

This means transcripts are immediately searchable via knowledge_search and the standard KB tools - useful when the AI is reasoning across notes, documents, and meeting transcripts together.

Response Field Reference

FieldTypeDescription
titlestring | nullUser-chosen display name. Null if not provided; falls back to filename.
providerstring | null"openai" or "whisperx". Null for legacy transcripts.
speakerCountnumber | nullDistinct speakers detected. WhisperX only.
segmentsarray | nullEach segment carries start, end, text, speaker with timestamps in seconds. WhisperX only.
documentIdstringPaired KB document for searchability.

Privacy Model

  • Per-user scoping - the list/detail/delete endpoints all filter by the requesting user.
  • AI access - the AI cannot use generic list_transcripts / get_transcript tools. It only has list_my_transcripts / get_my_transcript which enforce the same per-user scoping when the agent runs as a specific user (rare for transcripts).
  • Workspace owners - cannot read your transcripts via the UI or API. They can only see metadata in the audit log (filename, duration, timestamp) - never content.

See Also

We use cookies

Strictly necessary cookies keep you signed in and protect requests. We also use optional cookies for preferences and (when enabled) analytics. Learn more.