dub includes an additional audio track in the conversation. The source of the audio is an MP3 file linked directly from an http(s) URL. During the conversation, the track will play in the background as a second sound layer, independently of other verbs such as play or say. One common use is to apply it as background sound to simulate an office environment, for example, keyboard clicking, making the interaction between end users and AI Agents more humanlike.
Configuration
The following table lists the available parameters:| Parameter | Type | Description | Required |
|---|---|---|---|
| action | string | Specifies the action to perform on the audio track. Options include:
| Yes |
| track | string | The name of the audio track. Choose a descriptive name that reflects the content of the track, for example, office-sounds for background office noises or music-track for background music. Track names are referenced in playOnTrack or sayOnTrack actions. | Yes |
| id | string | A unique identifier for this verb instance. Useful for tracking events when notifyEvents is enabled, such as when the audio starts, finishes, or encounters an error. | No |
| play | string | The URL of an MP3 file to play on the track. The URL must use HTTP or HTTPS and doesn’t need to include the .mp3 extension. This allows streaming external audio files into the conversation dynamically. | No |
| say | string | object | A text string or an object to convert into audio and play on the track. When using an object, you can specify synthesizer settings such as vendor, language, and voice. This enables dynamic spoken audio on background tracks. | No |
| loop | boolean | Determines whether the MP3 audio will repeat continuously. If set to true, the audio specified in play will loop on the track until it is silenced or removed. Useful for ambient sounds or music that should play throughout the conversation. | No |
| gain | string | number | Adjusts the volume of the audio track relative to other conversation audio. You can specify a number or a string in decibels, such as +2dB to boost or -3dB to reduce. Acceptable values range from -50 dB to +50 dB. This allows fine-tuning of background audio levels. | No |