Voice Gateway Parameter Details¶

Cognigy Voice Gateway has many configuration settings that are controlled directly from within your Flow. These settings can be applied individually to the scopes:

Setting Session Parameters. Session parameters can comfortably be set with the Set Session Config Node. When executed, the settings applies for the remainder of the session.
Setting Activity Parameters. Activity parameters can be set for activities or Nodes. If, for example, Barge In is set on the Play Node, Barge In is only activated during the execution of this Node. Therefore, the user can interrupt the AI Agent during the output but not afterward. These configurations are also available in the Say, Question, and Optional Question Nodes.

Parameters¶

Synthesizer (TTS)

The Text-To-Speech (TTS) settings can be chosen from a pre-filled dropdown for Microsoft Azure, AWS, Google, Nuance, or a custom vendor.

Parameter	Type	Description
TTS Vendor	Dropdown	Defines the TTS vendor. You can select a custom vendor. The Default option's behavior in the Set Session Config Node depends on whether you select this option in the first or in a subsequent Node of this type: First Set Session Config Node — switches the TTS vendor to the default configuration set in the Voice Gateway Self-Service Portal Subsequent Set Session Config Node — keeps the TTS vendor set in the previous Set Session Config Node, allowing you to change other parameters without overwriting the TTS vendor.
Custom (Vendor)	CognigyScript	Allows for specifying an TTS Vendor, which isn't in the dropdown list. This option is only available on Voice Gateway. For preinstalled providers, use all lowercase letters, for example, `microsoft`, `google`, `aws`. For custom providers, use the name that you specified on the Speech Service page in Voice Gateway. The Custom field appears if you selected Custom from the TTS Vendor list.
TTS Language	Dropdown	Defines the language of the Voice AI Agent output.
Custom (Language)	CognigyScript	Allows for choosing a TTS language, which isn't in the dropdown list. Defines the language of the AI Agent output. The format to use depends on the TTS Vendor, for example, `de-DE`, `fr-FR`, `en-US`. The Custom field appears if you selected Custom from the TTS Language list.
TTS Voice	Dropdown	Defines the voice that should be used for the voice AI Agent output.
Custom (Voice)	CognigyScript	Allows for choosing a TTS voice, which isn't in the dropdown list. This setting can be the case for region-specific voices. The format to use depends on the TTS Vendor, for example, `de-DE-ConradNeural`.
TTS Label	CognigyScript	The alternative name of the vendor is the one you specify in the Voice Gateway Self-Service Portal. If you have created multiple speech services from the same vendor, use the label to specify which service to use.
Enable Advanced TTS Config	Toggle	Enables the addition of a URL for an Azure Custom Voice Endpoint.
Disable TTS Audio Caching	Toggle	Disables TTS audio caching. By default, the setting is deactivated. In this case, previously requested TTS audio results are stored in the AI Agent cache. When a new TTS request is made, and the audio text has been previously requested, the AI Agent retrieves the cached result instead of sending another request to the TTS provider. When the setting is activated, the AI Agent no longer caches TTS results. In this case, each request is directly sent to your speech provider. Note that disabling caching can increase TTS costs. For detailed information, contact your speech provider.

Recognizer (STT)

The Speech-To-Text (STT) settings can be chosen from a pre-filled dropdown for Microsoft Azure, AWS, Google, Nuance, Soniox, or a custom vendor.

Parameter	Type	Description
STT Vendor	Dropdown	Defines the STT vendor. You can select a custom vendor. The Default option's behavior in the Set Session Config Node depends on whether you select this option in the first or in a subsequent Node of this type: First Set Session Config Node — switches the STT vendor to the default configuration set in Voice Gateway Self-Service Portal Subsequent Set Session Config Node — keeps the TTS vendor set in the previous Set Session Config Node, allowing you to change other parameters without overwriting the STT vendor.
Custom (Vendor)	CognigyScript	Allows for specifying an STT Vendor, which isn't in the dropdown list. Select the desired STT Vendor. This option is only available on Voice Gateway. For preinstalled providers, use all lowercase letters like `microsoft`, `google`, `aws`. For custom providers, use the name that you specified on the Speech Service page in Voice Gateway. The Custom field appears if you selected Custom from the STT Vendor list.
STT Language	Dropdown	Defines the language that should be recognized.
Custom (Language)	CognigyScript	Allows for choosing a STT language, which isn't in the dropdown list. This setting can be the case for region-specific voices. The format to use depends on the STT Vendor, for example, `de-DE`, `fr-FR`, `en-US`. The Custom field appears if you selected Custom from the STT Language list.
Deepgram Model	Dropdown	This parameter is active only when Deepgram is selected in the STT Vendor setting. Choose a model for processing submitted audio. Each model is associated with a tier. Ensure that the selected tier is available for the chosen STT language. For detailed information about Deepgram models, refer to the Deepgram documentation.
Endpointing	Toggle	This parameter is active only when Deepgram is selected in the STT Vendor setting. Deepgram's Endpointing feature watches streaming audio for long pauses that signal the end of speech. When it spots an endpoint, it finalizes predictions and returns the transcript, marking it as complete with the `speech_final` parameter set to `true`. For detailed information about Deepgram Endpointing, refer to the Deepgram documentation. The duration for detecting the end of speech is preconfigured with a default value (10 milliseconds). If you want to change this value, use the Endpointing Time setting.
Endpointing Time	Number	This parameter is active only when Deepgram is selected in the STT Vendor setting and the Endpointing toggle is enabled. Customize the duration (in milliseconds) for detecting the end of speech. The default is 10 milliseconds of silence. Transcripts are sent after detecting silence, and the system waits until the speaker resumes or the required silence time is reached. Once either condition is met, a transcript is sent back with `speech_final` set to `true`.
Smart Formatting	Toggle	This parameter is active only when Deepgram is selected in the STT Vendor setting. Deepgram's Smart Format feature applies additional formatting to transcripts to optimize them for human readability. Smart Format capabilities vary between models. When Smart Formatting is turned on, Deepgram always applies the best-available formatting for your chosen model, tier, and language combination. For detailed examples, refer to the Deepgram documentation. Note that when Smart Formatting is turned on, punctuation is activated, even if you have the Disable STT Punctuation setting enabled.
STT Hints	Text	Array of words or phrases to assist speech detection. If you want to use multiple hints, enter each hint into a separate input field. For instance, you can enter `Germany` in the first field, `France` in the second field, and `Netherlands` in the third field. The STT provider receives the data in array format: ["Germany", "France", "Netherlands"]. Note: This requires support from the STT engine. The field isn't available for the Nuance speech vendor.
Dynamic Hints	CognigyScript	Uses context or input for adding array hints. For example, `{{context.hints}}` or `{{input.hints}}`. You can override these settings using Advanced parameters.
STT Label	CognigyScript	The alternative name of the vendor is the one you specify in the Voice Gateway Self-Service Portal. If you have created multiple speech services from the same vendor, use the label to specify which service to use.
Google Model	Dropdown	This parameter is active only when Google is selected in the STT Vendor setting. Utilizes one of Google Cloud STT transcription models, with the `latest_short` model being the default choice. For a detailed list of Google models, refer to the Transcription models section in the Google Documentation. Keep in mind that the `default` value is a Google Model type that can be used if other models don't suit your specific scenario.
Enable Voice Activity Detection	Toggle	Delays connection to cloud recognizer until the speech is detected.
VAD Sensitivity	Slider	Detection sensitivity, the lowest value has the highest sensitivity.
Minimal Voice Duration	Slider	Milliseconds of speech activity required before connecting to the cloud recognizer.
Disable STT Punctuation	Toggle	This parameter is active only when Google or Deepgram is selected in the STT Vendor setting. Prevents the STT response from the AI Agent to include punctuation marks.
Enable Advanced TTS Config	Toggle	Enables the addition of an ID for an Azure's Custom Speech model deployment.
Profanity Filter	Dropdown	Defines how offensive, inappropriate, or abusive language is handled in transcripts when using Microsoft Azure Speech Services as the STT vendor. Possible values: Raw - profanity shows up in transcripts. This option is selected by default. Masked - replaces profanity with asterisks (`**`) in transcripts. Removed** - removes profanity from transcripts. For more information, see the Microsoft Azure Speech Services documentation on the profanity filter.
Enable Audio Logging	Toggle	Enables recording and logging of audio from the user on Azure.
Recognize Language	Toggle	Enables the addition of alternative languages for recognition. You can select a maximum of 3 languages. To reuse these languages in other Nodes, such as the child Nodes of the Lookup Node, use the following format: `de-DE`, `fr-FR`, `en-US`. For the parent Node of the Lookup Node, specify `input.data.payload.speech.language_code`.

Barge In

Warning

Barge In uses the TTS and SST vendor to listen throughout the entire conversation. Consequently, Barge In may lead to increased subscription costs with your vendor.

Barge In is a feature that allows the caller to interrupt the voice AI Agent by using speech input or DTMF digits during the entire call. By default, this feature is turned off.

Before release 4.80, this feature couldn't be controlled when the call was transferred to the contact center. Barge In was always active, allowing the caller to interrupt the voice AI Agent at any time.

Starting with release 4.80, you can enable or disable Barge In when the call is redirected to the contact center. This improvement lets you decide whether the caller should listen to the voice AI Agent's messages fully or have the option to interrupt them. This way, the caller can't use Barge In to skip, for example, important legal information such as the GDPR.

To ensure Barge In works correctly after the call is transferred to the contact center, place the Set Session Config Node above the Handover to Agent Node.

Parameter	Type	Description
Barge In On Speech	Toggle	Enables interrupting the voice AI Agent with speech. The user is able to interrupt the voice AI Agent's responses even after the handover has taken place and a human agent communicates with the user through TTS . This parameter is disabled by default. It retains its setting throughout the whole conversation. Note that activating Barge In On Speech and Continuous ASR simultaneously may cause unstable behavior in the Recognizer (STT).
Barge In On DTMF	Toggle	Enables interrupting the voice AI Agent with DTMF digits. The user is able to interrupt the voice AI Agent's responses by pressing any digit, even after the handover has taken place and a human agent communicates with the user through TTS. This parameter is disabled by default. It retains its setting throughout the whole conversation.
Barge In Minimum Words	Slider	Defines the minimum number of words that the user must say for the Voice Gateway to consider it a barge in.

User Input Timeout

This feature defines what should happen when there is no input from the user.

Before the release 4.81, User Input Timeout was always enabled and users had to determine the number of milliseconds before timeout occurred. Starting from release 4.81, users can enable or disable User Input Timeout using a toggle. This setting keeps the voice AI Agent on the call even if the caller takes a while to respond. When the User Input Timeout is disabled, the voice AI Agent waits for the caller's response.

Parameter	Type	Description
Enable User No Input Timeout	Toggle	Enables or disables the User No Input Timeout parameter. This parameter is enabled by default.
User No Input Mode	Dropdown	This parameter is active only when Enable User No Input Timeout is enabled. Defines the action if a user doesn't provide an input to the AI Agent in time.
User No Input Timeout	Number	This parameter is active only when Enable User No Input Timeout is enabled. Defines the timeout duration for user input, specified in milliseconds (ms).
User No Input Retries	Number	This parameter is active only when Enable User No Input Timeout is enabled. Defines how often the voice AI Agent should retry to get an input from a user before completing the call.

Flow Input Timeout

This feature is designed for use cases where response delays occur in a Flow, such as when utilizing large language models (LLMs), waiting for responses from external services, or managing complex processing tasks. It helps maintain user engagement by proactively delivering information or prompts during these delays.

For example, you can inform end users that their request is being processed or that assistance is on the way.

Parameter	Type	Description
Enable Flow No Input Timeout	Toggle	This parameter is disabled by default. When enabled, this setting plays a track or speaks a prompt while waiting for a response from the Flow.
Flow No Input Mode	Dropdown	This parameter is active only when Enable Flow No Input Timeout is enabled. Select one of the following modes: Speak - outputs a spoken message, allowing you to communicate important information or prompts to the user. Play - plays an audio track, such as music, sound effects, or pre-recorded messages.
Flow No Input URL	URL	This parameter is active only when the Play option in Enable Flow No Input Timeout is selected. Define the URL that is played if the Flow doesn't continue in time. MP3 and WAV files are supported.
Flow No Input Speech	CognigyScript	This parameter is active only when the Speak option in Enable Flow No Input Timeout is selected. Define a prompt to be spoken if the Flow doesn't continue in time.
Flow No Input Timeout	Number	This parameter is active only when Enable Flow No Input Timeout is enabled. Defines how frequently the voice AI Agent should retry to obtain input from the Flow before completing the call.
Flow No Input Retries	Number	This parameter is active only when Enable Flow No Input Timeout is enabled. Define how many times the AI Agent should retry to obtain input from the Flow before executing an action.
AI Agent Fails on Error	Toggle	This parameter is disabled by default. It defines a failure condition when the maximum number of retry attempts is reached. If Call Failover is enabled in the Voice Gateway Endpoint Settings, the call is transferred either to another voice AI Agent or to a human agent in the contact center. If Call Failover isn't enabled, the call disconnects, leaving the user without further support.

DTMF

Enables DTMF collection.

Parameter	Type	Description
Capture DTMF signals	Toggle	Enables capturing DTMF signals by the AI Agent.
DTMF Inter Digit Timeout	Number	Defines the timeout between collected DTMF digits.
DTMF Max Digits	Number	Defines the maximum number of digits the user can enter. The digits are submitted automatically once this limit is reached.
DTMF Min Digits	Number	Defines the minimum number of digits before they are forwarded to the AI Agent. A submit digit can override this.
DTMF Submit Digit	CognigyScript	Defines the DTMF submit digit, which is used for submitting the previously entered digits. This action overrides the minimum digits validation.

Continuous ASR

Continuous ASR enables the Voice Gateway to concatenate multiple STT recognitions of the user and then send them as a single textual message to the AI Agent.

Parameter	Type	Description
Enable Continuous ASR	Toggle	Enable or disable Continuous ASR. Note that activating Barge In On Speech and Continuous ASR simultaneously may cause unstable behavior in the Recognizer (STT).
Continuous ASR Submit Digit	CognigyScript	Defines a special DTMF key that sends the accumulated recognitions to the Flow.
Continuous ASR Timeout	Number	Defines the number of milliseconds of silence before the accumulated recognitions are sent to the Flow. The default and minimum value is 2000.

Atmosphere Sounds

This feature is useful in scenarios where users interact with an AI Agent instead of a human when calling the contact center. Within the Atmosphere Sound section, you can configure the MP3 background track. This track may include office noises or other sounds that simulate human interaction, helping the caller feel they are speaking with a person rather than an AI Agent. Playing a background MP3 track during the conversation with AI Agents makes it more engaging and personalized.

The track plays during the conversation with the AI Agent, continues when the call is transferred to a human agent, and stops once the human agent accepts the call.

Parameter	Type	Description
Action	Dropdown	Selects an action to play, silence, or remove the track: play — plays the track in the background. silence — mutes the track. remove — removes the track from the background completely.
URL	Text	Accepts URL links to MP3 tracks. The URL doesn't need to include the `.mp3` extension. For example, `https://abc.xyz/music.mp3` or `https://audio.jukehost.co.uk/N5pnlULbup8KabGRE7dsGwHTeIZAwWdr`.
Loop	Toggle	Turns on looping for the audio track.
Volume	Number	Adjusts the volume of the track. Can be set from -50 to +50 dB. The default value is 0, meaning that the track is played as-is, with no adjustments to its volume. Users may need to adjust the volume by testing the call and checking if the Atmosphere Sounds track is neither too loud nor too quiet.

Silence Overlay

Silence Overlay enables you to play an MP3 file in the background during calls with an AI Agent. This feature is activated during prolonged periods of silence, which may result from the AI Agent's background activity. Playing the track informs the customer that the AI Agent is processing their query, which may take some time. The Silence Overlay track can simulate office sounds, for example, a human agent typing on a keyboard and clicking the mouse.

When Silence Overlay is enabled in the Set Session Config Node, the Silence Overlay track starts playing automatically once the AI Agent takes longer to respond, then stops the moment the AI Agent responds. You can adjust the delay before the Silence Overlay starts to make it sound more natural.

If you enabled the Call Recording feature in the Voice Gateway Self-Service Portal, the Silence Overlay track is recorded together with the AI Agent's track and can be played back in the audio file.

Parameter	Type	Description
Action	Dropdown	Defines an action to play or remove the track: play — plays the track in the background when prolonged silence occurs. remove — removes the track from the conversation. Next time a prolonged silence occurs, the Silence Overlay doesn't play. Make sure to place the next Set Session Config Node before the Node that needs to have Silence Overlay removed.
URL	Text	Accepts URL links to an MP3 track. The URL doesn't need to include the `.mp3` extension. For example, `https://abc.xyz/music.mp3` or `https://audio.jukehost.co.uk/N5pnlULbup8KabGRE7dsGwHTeIZAwWdr`. This parameter appears when the play action is selected.
Delay for starting the Silence Overlay	Number	Defines the wait time before the MP3 track plays, simulating a humanlike response. For example, human agents often have a pause between speaking and typing. This parameter appears when the play action is selected.

Advanced

Parameter	Type	Description
Additional Session Parameters	JSON	Allows for configuring settings using JSON. If you have already made changes using the UI settings above, this field overwrites them. Also, you can specify additional parameters in the JSON, which are unavailable in the UI, such as vendor credentials. If you want to specify a custom TTS or STT provider in the vendor parameter, use the `custom:<provider-name>` format, for example, `"vendor": "custom:My Speech provider"`.

List of Additional Session Parameters

Feature	Parameter	Description	Type	Example
User Input Timeout	`user`	The User Input Timeout feature.	Object	-
	`noInputMode`	The mode when no input is detected. Possible values: `"event"`, `"speech"`, `"play"`.	String	`"speech"`
	`noInputTimeout`	The timeout duration for no input (in milliseconds).	Number	`5000` (5 seconds)
	`noInputRetries`	The number of retries allowed when no input is detected.	Number	`3`
	`noInputSpeech`	The speech to play when no input is detected.	String	`"Please speak now."`
	`noInputUrl`	The URL to fetch audio when no input is detected.	String	`"https://example.com/no-input-audio.mp3"`
Flow Input Timeout	`flow`	The Flow Input Timeout feature.	Object	`{}`
	`flowNoInputEnable`	Enables no-input handling for Flows.	Boolean	`true`
	`flowNoInputMode`	The mode when no input is detected in a Flow. Possible values: `"speech"`, `"play"`.	String	`"play"`
	`flowNoInputTimeout`	The timeout duration for no input in a Flow (in milliseconds).	Number	`3000` (3 seconds)
	`flowNoInputRetries`	The number of retries for no input in a Flow.	Number	`2`
	`flowNoInputSpeech`	The speech to play for no input in a Flow.	String	`"Please try again."`
	`flowNoInputUrl`	The URL to fetch audio for no input in a Flow.	String	`"https://example.com/flow-no-input.mp3"`
	`flowNoInputFail`	Indicates whether to fail the Flow on no input.	Boolean	`true`
Recognizer	`recognizer`	The Recognizer (STT) feature.	Object	-
	`vendor`	The vendor for speech recognition. Possible values: `"aws"`, `"deepgram"`, `"google"`, `"microsoft"`, `"nuance"`, `"soniox"`, `"default"`, `"custom"`.	String	`"google"`
	`language`	The language for speech recognition.	String	`"de-DE"`
	`hints`	A list of hints to improve recognition accuracy.	Array of strings	`["help", "skip", "confirm"]`
	`hintsBoost`	A value to boost the weight of hints in recognition.	Number	`20`
	`profanityOption`	The option for handling offensive, inappropriate, or abusive language in transcripts when using Microsoft Azure Speech Services as the STT vendor. Possible values: `"raw"` - profanity show up in transcripts. This option is selected by default. `"masked"` - replaces profanity with asterisks (`****`) in transcripts. `"removed"` - removes profanity from transcripts. For more information, see the Microsoft Azure Speech Services documentation on profanity filter.	String	`"masked"`
Synthesizer	`synthesizer`	The Synthesizer (TTS) feature.	Object	-
	`vendor`	The vendor for TTS synthesis. Possible values: `"aws"`, `"deepgram"`, `"elevenlabs"`, `"google"`, `"microsoft"`, `"nuance"`, `"default"`, `"custom"`.	String	`"microsoft"`
	`language`	The language for TTS synthesis.	String	`"de-DE"`
	`voice`	The voice used for TTS synthesis.	String	`"en-US-JennyNeural"`
DTMF	`dtmf`	The DTMF feature.	Object	-
	`dtmfEnable`	Enables DTMF (Dual-Tone Multi-Frequency) input.	Boolean	`true`
	`dtmfInterDigitTimeout`	The timeout between digits during DTMF input (in milliseconds).	Number	`1000` (1 second)
	`dtmfMaxDigits`	The maximum number of digits allowed for DTMF input.	Number	`10`
	`dtmfMinDigits`	The minimum number of digits required for DTMF input.	Number	`3`
	`dtmfSubmitDigit`	The digit used to submit the DTMF input.	String	`"9"`
Barge In	`bargeIn`	The Barge In feature.	Object	-
	`bargeInEnable`	Enables Barge In functionality.	Boolean	`true`
	`bargeInOnDtmf`	Allows Barge In on DTMF input.	Boolean	`true`
	`bargeInOnSpeech`	Allows Barge In on speech input.	Boolean	`false`
	`bargeInMinimunWords`	The minimum words required to trigger Barge In.	Number	`5`
	`bargeInSticky`	Keeps Barge In active after initial input.	Boolean	`true`
Continuous ASR	`continuousAsr`	The Continuous ASR feature.	Object	-
	`asrEnabled`	Enables Continuous Automatic Speech Recognition (ASR).	Boolean	`true`
	`asrDigit`	The specific digit for triggering ASR.	String	`"5"`
	`asrTimeout`	The timeout for ASR detection (in milliseconds).	Number	`5000` (5 seconds)

JSON example:

{
  "synthesizer": {
    "vendor": "microsoft",
    "language": "de-DE",
    "voice": "en-US-JennyNeural"
  },
  "recognizer": {
    "vendor": "google",
    "language": "de-DE",
    "hints": [
      "help",
      "skip",
      "confirm"
    ],
    "hintsBoost": 20
  },
  "user": {
    "noInputMode": "speech",
    "noInputTimeout": "{{context.user.noInputTimeout}}",
    "noInputRetries": "{{context.user.noInputRetries}}",
    "noInputSpeech": "{{context.user.noInputSpeech}}"
  },
  "continuousAsr": {
    "asrEnabled": true,
    "asrTimeout": "{{context.continuousAsr.asrTimeout}}"
  },
  "dtmf": {
    "dtmfEnable": true,
    "dtmfInterDigitTimeout": "{{context.dtmf.dtmfInterDigitTimeout}}"
  },
  "flow": {
    "flowNoInputEnable": true,
    "flowNoInputMode": "speech",
    "flowNoInputTimeout": 15,
    "flowNoInputRetries": 2,
    "flowNoInputSpeech": "I'm sorry, I didn't hear anything"
  }
}