listen verb sends real-time audio streams to your external server over a WebSocket connection for processing.
The Call Recording feature relies on this verb.
This table outlines the properties related to the audio streams sent by the listen verb:
| Properties | Description |
|---|---|
| Format | 16-bit |
| Encoding | PCM |
| Sample rate | user-specified |
| Connection type | websocket |
metadata parameter.
The listen verb can also be nested in a dial or config verb, allowing the audio for a call between two parties to be sent to a remote WebSocket server.
Configuration
The full set of configuration parameters:| Parameters | Description | Required |
|---|---|---|
| actionHook | A webhook to receive an HTTP POST when the listen operation ends. The information will include the duration of the audio stream, and also a digits property if the recording was terminated by a DTMF key. | Yes |
| bidirectionalAudio.enabled | Sends audio over the WebSocket connection back to the call. The default value is true. | No |
| bidirectionalAudio.sampleRate | Is required when streaming is enabled. This parameter specifies the sampling rate of the audio in Hz that is sent over the WebSocket connection back to the call. | No |
| bidirectionalAudio.streaming | Breaks the raw audio into segments (chunks) and sends them over the WebSocket connection back to the call. The audio is expected to be in linear PCM format (uncompressed raw audio format without headers). The default value is false. This parameter works if bidirectionalAudio.enabled is activated. | No |
| disableBidirectionalAudio | This parameter is deprecated. If set to true, it will disable bidirectional audio, which is equivalent to setting bidirectionalAudio.enabled = false. The default value is false. | No |
| finishOnKey | The set of digits that can end the listen action. | No |
| maxLength | The maximum length of the listened audio stream in seconds. | No |
| metadata | Arbitrary data to add to the JSON payload sent to the remote server when the WebSocket connection is first established. | No |
| mixType | The following types can be specified: - mono — sends a single channel,- stereo — sends a dual channel of both calls in a bridge,- mixed — sends audio from both calls in a bridge in a single mixed audio stream. The default value is mono. | No |
| passDtmf | If this parameter is true, detected DTMF digits will be sent over WebSocket as JSON text frames. The default value is false. | No |
| playBeep | Enable a beep sound when the listen operation starts. The default value is false. | No |
| sampleRate | The sample rate of audio to send. Allowable values: 8000, 16000, 24000, 48000, or 64000. The default value is 8000. | No |
| timeout | The number of seconds of silence that terminates the listen operation. | No |
| transcribe | A nested transcribe verb. | No |
| url | The URL of the remote server to connect to. | Yes |
| wsAuth.username | The HTTP basic auth username to use on the WebSocket connection. | No |
| wsAuth.password | The HTTP basic auth password to use on the WebSocket connection. | No |
Passing DTMF
Any DTMF digits entered by the far end party on the call can optionally be passed to the WebSocket server as JSON text frames by setting thepassDtmf property to true. Each DTMF entry is reported separately in a payload containing the specific DTMF key that was entered, along with its duration reported in RTP timestamp units.
The payload example: