amd property in a dial verb.
In this example, the Answering Machine Detection feature is activated as soon as the call is answered, and later sends a webhook to amd to determine if a human or a machine has answered the call.
Configuration
The following table lists the available parameters:| Parameter | Type | Description | Required |
|---|---|---|---|
| actionHook | string | object | A webhook to receive an HTTP POST for AMD events. The default value is amd. | Yes |
| thresholdWordCount | number | The number of spoken words in a greeting that result in an amd_machine_detected result. The default value is 9. | No |
| recognizer | object | Speech recognition parameters, used as per the gather and transcribe verbs. The default value is application. | No |
| recognizer.vendor | string | The speech recognition provider to use, for example, Google, Amazon, or Azure. The vendor determines transcription quality, supported languages, and feature availability. | Yes |
| recognizer.label | string | A custom label to identify this recognizer instance in logs or dashboards. Helpful when multiple recognizers are configured. | No |
| recognizer.language | string | The primary language code for transcription, for example, en-US for English, fr-FR for French. Determines how speech is interpreted. | No |
| recognizer.hints | array | An array of words or phrases that may appear in the audio and should be recognized more accurately. Useful for domain-specific terms, names, or technical vocabulary. | No |
| recognizer.hintsBoost | number | A numeric value specifying how strongly the recognizer should prioritize the hint words. Higher numbers give stronger emphasis, improving accuracy for key terms. | No |
| recognizer.altLanguages | array | An array of additional language codes that the recognizer can use for multilingual audio. Allows recognition of mixed-language content. | No |
| recognizer.profanityFilter | boolean | If true, the recognizer will automatically remove or mask profanity from the transcription output. | No |
| recognizer.interim | boolean | If true, returns partial transcription results as the audio is being processed. Useful for live captions or real-time feedback. | No |
| recognizer.punctuation | boolean | If true, punctuation marks, for example, periods or commas, are included in the transcription to improve readability. | No |
| recognizer.diarization | boolean | If true, enables speaker diarization, which assigns segments of the transcript to individual speakers. | No |
| recognizer.diarizationMinSpeakers | number | The minimum number of speakers expected in the audio. Helps the diarization algorithm distinguish between speakers accurately. | No |
| recognizer.diarizationMaxSpeakers | number | The maximum number of speakers expected in the audio. Prevents the algorithm from splitting speech unnecessarily. | No |
| recognizer.vad | object | Voice Activity Detection settings. Determines how the system detects when someone is speaking vs. silence, improving transcription timing and accuracy. | No |
| recognizer.fallbackVendor | string | Specifies an alternative transcription vendor to use if the primary vendor fails. Ensures reliability in critical workflows. | No |
| recognizer.fallbackLanguage | string | Language code to use for the fallback vendor. Must match a language supported by the fallback provider. | No |
| timers | object | An object containing various timeouts. | No |
| timers.noSpeechTimeoutMs | number | The time in milliseconds to wait for speech before returning amd_no_speech_detected. The default value is 5000. | No |
| timers.decisionTimeoutMs | number | The time in milliseconds to wait before returning amd_decision_timeout. The default value is 15000. | No |
| timers.toneTimeoutMs | number | The time in milliseconds to wait to hear a tone. The default value is 20000. | No |
| timers.greetingCompletionTimeoutMs | number | The silence in milliseconds to wait for during greeting before returning amd_machine_stopped_speaking. The default value is 2000. | No |
Events
The payload included in theactionHook always contains a type property describing the event type.
Some event types may include additional properties.
| Event | Description | Additional Properties |
|---|---|---|
| amd_human_detected | A human is speaking. | {reason, greeting, language}, where: - reason — a short greeting, - greeting — a recognized greeting. - language — a recognized language. |
| amd_machine_detected | A machine is speaking. | {reason, hint, transcript, language}, where: - reason — a hint or long greeting. - hint — a recognized hint. - transcript — a recognized greeting. - language — a recognized language. |
| amd_no_speech_detected | No speech was detected. | - |
| amd_decision_timeout | No decision was able to be made in the time given. | - |
| amd_machine_stopped_speaking | Machine has completed the greeting. | - |
| amd_tone_detected | A beep was detected. | - |
| amd_error | An error has occurred. | An error message. |
| amd_stopped | Answering Machine Detection was stopped. | - |
amd_machine_detectedamd_tone_detectedamd_machine_stopped_speaking
Inbound calls
You can use Answering Machine Detection for incoming calls by adding anamd property in a config verb. It can be useful in situations where Voice Gateway is located behind a dialer. In these cases, the dialer initiates the outbound call and then links it to Voice Gateway via an INVITE request.