transcribe verb generates real-time transcriptions of speech.
This verb can be nested only within the following verbs:
- dial
- listen
When nested within a
dial verb, transcribe provides long-running transcription of a phone call.dial
Configuration
The following table lists the available parameters:| Parameter | Type | Description | Required |
|---|---|---|---|
| transcriptionHook | string | A webhook URL where the system sends an HTTP POST whenever a partial or final transcription result is available from the provider. This allows your application to process or store transcripts in real time. | Yes |
| translationHook | string | A webhook URL where the system sends an HTTP POST whenever a translation of the transcribed text is available. Only used if translation is enabled. Useful for multilanguage workflows. | No |
| recognizer | object | Contains configuration options for the speech recognition engine. This includes language selection, hints, diarization, and other advanced settings. | No |
| recognizer.vendor | string | The speech recognition provider to use, for example, Google, Amazon, or Azure. The vendor determines transcription quality, supported languages, and feature availability. | Yes |
| recognizer.label | string | A custom label to identify this recognizer instance in logs or dashboards. Helpful when multiple recognizers are configured. | No |
| recognizer.language | string | The primary language code for transcription, for example, en-US for English, fr-FR for French. Determines how speech is interpreted. | No |
| recognizer.hints | array | An array of words or phrases that may appear in the audio and should be recognized more accurately. Useful for domain-specific terms, names, or technical vocabulary. | No |
| recognizer.hintsBoost | number | A numeric value specifying how strongly the recognizer should prioritize the hint words. Higher numbers give stronger emphasis, improving accuracy for key terms. | No |
| recognizer.altLanguages | array | An array of additional language codes that the recognizer can use for multilingual audio. Allows recognition of mixed-language content. | No |
| recognizer.profanityFilter | boolean | If true, the recognizer will automatically remove or mask profanity from the transcription output. | No |
| recognizer.interim | boolean | If true, returns partial transcription results as the audio is being processed. Useful for live captions or real-time feedback. | No |
| recognizer.punctuation | boolean | If true, punctuation marks, for example, periods or commas, are included in the transcription to improve readability. | No |
| recognizer.diarization | boolean | If true, enables speaker diarization, which assigns segments of the transcript to individual speakers. | No |
| recognizer.diarizationMinSpeakers | number | The minimum number of speakers expected in the audio. Helps the diarization algorithm distinguish between speakers accurately. | No |
| recognizer.diarizationMaxSpeakers | number | The maximum number of speakers expected in the audio. Prevents the algorithm from splitting speech unnecessarily. | No |
| recognizer.vad | object | Voice Activity Detection settings. Determines how the system detects when someone is speaking vs. silence, improving transcription timing and accuracy. | No |
| recognizer.fallbackVendor | string | Specifies an alternative transcription vendor to use if the primary vendor fails. Ensures reliability in critical workflows. | No |
| recognizer.fallbackLanguage | string | Language code to use for the fallback vendor. Must match a language supported by the fallback provider. | No |
| earlyMedia | boolean | If true, transcription starts as soon as audio begins, even before the call is answered. Useful for capturing pre-call audio, for example, IVR prompts. The default value is false. | No |
| channel | number | A number specifying which audio channel to transcribe in multichannel recordings. Each channel is a separate audio track—for example, 0 is left and 1 is right in stereo, or 0, 1, 2 for individual participants or microphones in multi-speaker recordings. | No |
deepgramOptions, googleOptions, azureOptions, awsOptions, nuanceOptions, ibmOptions, nvidiaOptions, cobaltOptions, sonioxOptions, verbioOptions, speechmaticsOptions, assemblyAiOptions, openaiOptions, and customOptions.