Skip to main content
The gather verb is used to collect DTMF or speech input.
{
  "verb": "gather",
  "actionHook": "http://example.com/collect",
  "input": ["digits", "speech"],
  "bargein": true,
  "dtmfBargein": true,
  "finishOnKey": "#",
  "numDigits": 5,
  "timeout": 8,
  "recognizer": {
    "vendor": "Google",
    "language": "en-US",
    "hints": ["sales", "support"],
    "hintsBoost": 10
  },
  "say": {
    "text": "To speak to Sales press 1 or say Sales. To speak to Customer Support press 2 or say Support",
    "synthesizer": {
      "vendor": "Google",
      "language": "en-US",
      "voice": "en-US-Wavenet-F"
    }
  }
}

Configuration

The following table lists the available parameters:
ParameterTypeDescriptionRequired
actionHookstring | objectA webhook to receive an HTTP POST with the collected digits or speech. The payload includes a speech or dtmf property along with the standard attributes.No
actionHookDelayActionobjectConfigures a delayed action hook behavior. See config for details.No
bargeinbooleanEnables a speech barge-in, which pauses audio playback if the caller starts speaking.No
dtmfBargeinbooleanEnables a DTMF barge-in, which pauses audio playback if the caller enters DTMF tones.No
fillerNoiseobjectConfigures a filler noise (background audio) while waiting for input. See config for details.No
finishOnKeystringThe DTMF key that signals the end of input.No
inputarrayAn array specifying the allowed types of input: ['digits'], ['speech'], or ['digits', 'speech']. The default value is ['digits'].No
interDigitTimeoutnumberThe time to wait between digits after minDigits have been entered.No
listenDuringPromptbooleanIf false, the system won’t listen for user speech until the say or play verb completes. The default value is true.No
minBargeinWordCountnumberIf bargein is true, stops the playback only after this many words are spoken. The default value is 1.No
minDigitsnumberThe minimum number of DTMF digits expected. The default value is 1.No
maxDigitsnumberThe maximum number of DTMF digits expected.No
numDigitsnumberThe exact number of DTMF digits expected.No
partialResultHookstring | objectA webhook that receives POST requests with interim transcription results. Partial transcriptions are only generated if this property is set.No
playobjectA nested play verb used to prompt the user.No
recognizerobjectContains configuration options for the speech recognition engine. This includes language selection, hints, diarization, and other advanced settings.No
recognizer.vendorstringThe speech recognition provider to use, for example, Google, Amazon, or Azure. The vendor determines transcription quality, supported languages, and feature availability.Yes
recognizer.labelstringA custom label to identify this recognizer instance in logs or dashboards. Helpful when multiple recognizers are configured.No
recognizer.languagestringThe primary language code for transcription, for example, en-US for English, fr-FR for French. Determines how speech is interpreted.No
recognizer.hintsarrayAn array of words or phrases that may appear in the audio and should be recognized more accurately. Useful for domain-specific terms, names, or technical vocabulary.No
recognizer.hintsBoostnumberA numeric value specifying how strongly the recognizer should prioritize the hint words. Higher numbers give stronger emphasis, improving accuracy for key terms.No
recognizer.altLanguagesarrayAn array of additional language codes that the recognizer can use for multilingual audio. Allows recognition of mixed-language content.No
recognizer.profanityFilterbooleanIf true, the recognizer will automatically remove or mask profanity from the transcription output.No
recognizer.interimbooleanIf true, returns partial transcription results as the audio is being processed. Useful for live captions or real-time feedback.No
recognizer.punctuationbooleanIf true, punctuation marks, for example, periods or commas, are included in the transcription to improve readability.No
recognizer.diarizationbooleanIf true, enables speaker diarization, which assigns segments of the transcript to individual speakers.No
recognizer.diarizationMinSpeakersnumberThe minimum number of speakers expected in the audio. Helps the diarization algorithm distinguish between speakers accurately.No
recognizer.diarizationMaxSpeakersnumberThe maximum number of speakers expected in the audio. Prevents the algorithm from splitting speech unnecessarily.No
recognizer.vadobjectVoice Activity Detection settings. Determines how the system detects when someone is speaking vs. silence, improving transcription timing and accuracy.No
recognizer.fallbackVendorstringSpecifies an alternative transcription vendor to use if the primary vendor fails. Ensures reliability in critical workflows.No
recognizer.fallbackLanguagestringLanguage code to use for the fallback vendor. Must match a language supported by the fallback provider.No
sayobjectA nested say verb used to prompt the user.No
speechTimeoutnumberThe time in seconds to wait for speech input before timing out.No
timeoutnumberThe total time in seconds to wait for input before timing out.No

Example

When speech input is used, the actionHook payload contains a speech object with the response from the speech provider, such as Google Speech.
"speech": {
			"stability": 0,
			"is_final": true,
			"alternatives": [{
				"confidence": 0.858155,
				"transcript": "sales please"
			}]
		}
In the case of digits input, the payload includes a digits property indicating the DTMF keys pressed:
"digits": "0276"

More Information