Skip to main content
The Answering Machine Detection feature can be enabled on outbound calls to provide an indication of whether a call has been answered by a person or a machine. To use this feature, provide the amd property in a dial verb. In this example, the Answering Machine Detection feature is activated as soon as the call is answered, and later sends a webhook to amd to determine if a human or a machine has answered the call.
{
  "verb": "dial",
  "actionHook": "dial",
  "callerId": "+49XXXXXXXXXXX",
  "target": [
    {
      "type": "phone",
      "number": "+49XXXXXXXXXXX",
      "trunk": "Twilio"
    }
  ],
  "amd": {
        "actionHook": "amd",
        "recognizer": {
            "vendor": "microsoft",
            "language": "en-US",
        }
    }
}
Example of a webhook payload:
{"type":"amd_human_detected"} 

{"type":"amd_machine_detected","reason":"hint","hint":"call has been forwarded","language":"en-us"}

{"type":"amd_no_speech_detected"}

Configuration

The following table lists the available parameters:
ParameterTypeDescriptionRequired
actionHookstring | objectA webhook to receive an HTTP POST for AMD events. The default value is amd.Yes
thresholdWordCountnumberThe number of spoken words in a greeting that result in an amd_machine_detected result. The default value is 9.No
recognizerobjectSpeech recognition parameters, used as per the gather and transcribe verbs. The default value is application.No
recognizer.vendorstringThe speech recognition provider to use, for example, Google, Amazon, or Azure. The vendor determines transcription quality, supported languages, and feature availability.Yes
recognizer.labelstringA custom label to identify this recognizer instance in logs or dashboards. Helpful when multiple recognizers are configured.No
recognizer.languagestringThe primary language code for transcription, for example, en-US for English, fr-FR for French. Determines how speech is interpreted.No
recognizer.hintsarrayAn array of words or phrases that may appear in the audio and should be recognized more accurately. Useful for domain-specific terms, names, or technical vocabulary.No
recognizer.hintsBoostnumberA numeric value specifying how strongly the recognizer should prioritize the hint words. Higher numbers give stronger emphasis, improving accuracy for key terms.No
recognizer.altLanguagesarrayAn array of additional language codes that the recognizer can use for multilingual audio. Allows recognition of mixed-language content.No
recognizer.profanityFilterbooleanIf true, the recognizer will automatically remove or mask profanity from the transcription output.No
recognizer.interimbooleanIf true, returns partial transcription results as the audio is being processed. Useful for live captions or real-time feedback.No
recognizer.punctuationbooleanIf true, punctuation marks, for example, periods or commas, are included in the transcription to improve readability.No
recognizer.diarizationbooleanIf true, enables speaker diarization, which assigns segments of the transcript to individual speakers.No
recognizer.diarizationMinSpeakersnumberThe minimum number of speakers expected in the audio. Helps the diarization algorithm distinguish between speakers accurately.No
recognizer.diarizationMaxSpeakersnumberThe maximum number of speakers expected in the audio. Prevents the algorithm from splitting speech unnecessarily.No
recognizer.vadobjectVoice Activity Detection settings. Determines how the system detects when someone is speaking vs. silence, improving transcription timing and accuracy.No
recognizer.fallbackVendorstringSpecifies an alternative transcription vendor to use if the primary vendor fails. Ensures reliability in critical workflows.No
recognizer.fallbackLanguagestringLanguage code to use for the fallback vendor. Must match a language supported by the fallback provider.No
timersobjectAn object containing various timeouts.No
timers.noSpeechTimeoutMsnumberThe time in milliseconds to wait for speech before returning amd_no_speech_detected. The default value is 5000.No
timers.decisionTimeoutMsnumberThe time in milliseconds to wait before returning amd_decision_timeout. The default value is 15000.No
timers.toneTimeoutMsnumberThe time in milliseconds to wait to hear a tone. The default value is 20000.No
timers.greetingCompletionTimeoutMsnumberThe silence in milliseconds to wait for during greeting before returning amd_machine_stopped_speaking. The default value is 2000.No

Events

The payload included in the actionHook always contains a type property describing the event type. Some event types may include additional properties.
EventDescriptionAdditional Properties
amd_human_detectedA human is speaking.{reason, greeting, language}, where:
- reason — a short greeting,
- greeting — a recognized greeting.
- language — a recognized language.
amd_machine_detectedA machine is speaking.{reason, hint, transcript, language}, where:
- reason — a hint or long greeting.
- hint — a recognized hint.
- transcript — a recognized greeting.
- language — a recognized language.
amd_no_speech_detectedNo speech was detected.-
amd_decision_timeoutNo decision was able to be made in the time given.-
amd_machine_stopped_speakingMachine has completed the greeting.-
amd_tone_detectedA beep was detected.-
amd_errorAn error has occurred.An error message.
amd_stoppedAnswering Machine Detection was stopped.-
Multiple events can occur during a single call. For example, on a call to an answering machine, the sequence could be:
  1. amd_machine_detected
  2. amd_tone_detected
  3. amd_machine_stopped_speaking

Inbound calls

You can use Answering Machine Detection for incoming calls by adding an amd property in a config verb. It can be useful in situations where Voice Gateway is located behind a dialer. In these cases, the dialer initiates the outbound call and then links it to Voice Gateway via an INVITE request.

More information