Session Speech Parameters Config¶

Description¶

Deprecation of AudioCodes and Generic Voice Nodes

AudioCodes Nodes and Generic Voice Nodes have been deprecated. Flows with these Nodes continue to work, and you can still edit and clone them as well as export them as a Package. However, you can't create new ones in Cognigy.AI v4.96 or later.

This Node enables the change of speech parameters during the Flow.

When executed, the settings will apply for the remainder of the session.

Synthesizer - Text-To-Speech - Settings¶

The TTS settings can be chosen from a pre-filled dropdown for Microsoft Azure, AWS, Google, Nuance, or a custom vendor.

Parameter	Type	Description
TTS Vendor	Dropdown	Defines the desired TTS Vendor. You can select a custom vendor.
Custom (Vendor)	CognigyScript	Allows for specifying an TTS Vendor, which is not in the dropdown list. This option is only available on Voice Gateway. For preinstalled providers, use all lowercase letters, for example, `microsoft`, `google`, `aws`. For custom providers, use the name that you specified on the Speech Service page in Voice Gateway. The Custom field appears if you selected Custom from the TTS Vendor list.
TTS Language	Dropdown	Defines the language of the Voice AI Agent output.
Custom (Language)	CognigyScript	Allows for choosing a TTS language, which is not in the dropdown list. Defines the language of the AI Agent output. The format to use depends on the TTS Vendor, for example, `de-DE`, `fr-FR`, `en-US`. The Custom field appears if you selected Custom from the TTS Language list.
TTS Voice	Dropdown	Defines the voice that should be used for the voice AI Agent output.
Custom (Voice)	CognigyScript	Allows for choosing a TTS voice, which is not in the dropdown list. This setting can be the case for region-specific voices. The format to use depends on the TTS Vendor, for example, `de-DE-ConradNeural`.
Enable Advanced TTS Config	Toggle	Enables the addition of a URL for an Azure Custom Voice Endpoint. This setting is only available and appears for TTS Vendor Microsoft.
Disable TTS Audio Caching	Toggle	Disables TTS audio caching. By default, the setting is deactivated. In this case, previously requested TTS audio results are stored in the AI Agent cache. When a new TTS request is made and the audio text has been previously requested, the AI Agent retrieves the cached result instead of sending another request to the TTS provider. When the setting is activated, the AI Agent caches TTS results but doesn't use them. In this case, each request is directly sent to your speech provider. Note that disabling caching can increase TTS costs. For detailed information, contact your speech provider.

Recognizer - Speech-To-Text - Settings¶

The STT settings can be chosen from a prefilled dropdown for Microsoft Azure, AWS, Google, Nuance, Soniox, or a custom vendor.

Parameter	Type	Description
STT Vendor	Dropdown	Defines the desired STT Vendor. You can select a custom vendor.
Custom (Vendor)	CognigyScript	Allows for specifying an STT Vendor, which is not in the dropdown list. Select the desired STT Vendor. This option is only available on Voice Gateway. For preinstalled providers, use all lowercase letters like `microsoft`, `google`, `aws`. For custom providers, use the name that you specified on the Speech Service page in Voice Gateway. The Custom field appears if you selected Custom from the STT Vendor list.
STT Language	Dropdown	Defines the language that should be recognized.
Custom (Language)	CognigyScript	Allows for choosing a STT language, which is not in the dropdown list. This setting can be the case for region-specific voices. The format to use depends on the STT Vendor, for example, `de-DE`, `fr-FR`, `en-US`. The Custom field appears if you selected Custom from the STT Language list.
STT Hints	Text	Array of words or phrases to assist speech detection. If you want to use multiple hints, enter each hint into a separate input field. For instance, you can enter `Germany` in the first field, `France` in the second field, and `Netherlands` in the third field. The STT provider will receive the data in array format: ["Germany", "France", "Netherlands"]. Note: This requires support from the STT engine. The field is not available for the Nuance speech vendor.
Dynamic Hints	CognigyScript	Uses context or input for adding array hints. For example, `{{context.hints}}` or `{{input.hints}}`.
Google Model	Dropdown	This parameter is active only when Google is selected in the STT Vendor setting. Utilizes one of Google Cloud Speech-to-Text transcription models, with the `latest_short` model being the default choice. For a detailed list of Google models, refer to the Transcription models section in the Google Documentation. Keep in mind that the `default` value is a Google Model type that can be used if other models don't suit your specific scenario.
Enable Voice Activity Detection	Toggle	Delays connection to cloud recognizer until the speech is detected.
Disable STT Punctuation	Toggle	This parameter is active only when Google or Deepgram is selected in the STT Vendor setting. Prevents the STT response from the AI Agent to include punctuation marks.
Deepgram Model	Dropdown	This parameter is active only when Deepgram is selected in the STT Vendor setting. Choose a model for processing submitted audio. Each model is associated with a tier. Ensure that the selected tier is available for the chosen STT language. For detailed information about Deepgram models, refer to the Deepgram documentation.
Endpointing	Toggle	This parameter is active only when Deepgram is selected in the STT Vendor setting. Deepgram's Endpointing feature watches streaming audio for long pauses that signal the end of speech. When it spots an endpoint, it finalizes predictions and returns the transcript, marking it as complete with the `speech_final` parameter set to `true`. For detailed information about Deepgram Endpointing, refer to the Deepgram documentation. The duration for detecting the end of speech is preconfigured with a default value (10 milliseconds). If you want to change this value, use the Endpointing Time setting.
Endpointing Time	Number	This parameter is active only when Deepgram is selected in the STT Vendor setting and the Endpointing toggle is enabled. Customize the duration (in milliseconds) for detecting the end of speech. The default is 10 milliseconds of silence. Transcripts are sent after detecting silence, and the system waits until the speaker resumes or the required silence time is reached. Once either condition is met, a transcript is sent back with `speech_final` set to `true`.
Smart Formatting	Toggle	This parameter is active only when Deepgram is selected in the STT Vendor setting. Deepgram's Smart Format feature applies additional formatting to transcripts to optimize them for human readability. Smart Format capabilities vary between models. When Smart Formatting is turned on, Deepgram will always apply the best-available formatting for your chosen model, tier, and language combination. For detailed examples, refer to the Deepgram documentation.
Enable Advanced TTS Config	Toggle	Enables the addition of an ID for an Azure's Custom Speech model deployment.
Enable Audio Logging	Toggle	Enables recording and logging of audio from the user on Azure.