> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognigy.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Listen

> The verb streams real-time call audio over WebSocket, supporting call recording, DTMF, and bidirectional audio.

The `listen` verb sends real-time audio streams to your external server over a WebSocket connection for processing.
The [Call Recording](/voice-gateway/webapp/recent-calls#call-recordings) feature relies on this verb.

This table outlines the properties related to the audio streams sent by the `listen` verb:

| Properties      | Description    |
| --------------- | -------------- |
| Format          | 16-bit         |
| Encoding        | PCM            |
| Sample rate     | user-specified |
| Connection type | websocket      |

One text frame is sent immediately
after the WebSocket connection is established to send a JSON string with call attributes over an HTTP request.
Additional metadata can also be added to this payload using the `metadata` parameter.

The `listen` verb can also be nested in a [`dial`](/voice-gateway/references/verbs/dial) or [`config`](/voice-gateway/references/verbs/config) verb, allowing the audio for a call between two parties to be sent to a remote WebSocket server.

```json theme={null}
{
  "verb": "listen",
  "url": "wss://myrecorder.example.com/calls",
  "mixType" : "stereo"
}
```

## Configuration

The following table lists the available parameters:

| Parameter                     | Type             | Description                                                                                                                                                                                                                                                                                                                                                          | Required                    |
| ----------------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------- |
| url                           | string           | The URL of the remote WebSocket server to connect to.                                                                                                                                                                                                                                                                                                                | Yes                         |
| actionHook                    | string \| object | A webhook to receive an HTTP POST when the listen operation ends. The information will include the duration of the audio stream, and also a `digits` property if the recording was terminated by a DTMF key.                                                                                                                                                         | No                          |
| auth                          | object           | Authentication credentials object with `username` and `password` properties.                                                                                                                                                                                                                                                                                         | No                          |
| auth.username                 | string           | The username for authentication.                                                                                                                                                                                                                                                                                                                                     | Yes (if auth is provided)   |
| auth.password                 | string           | The password for authentication.                                                                                                                                                                                                                                                                                                                                     | Yes (if auth is provided)   |
| bidirectionalAudio            | object           | Object to configure bidirectional audio settings.                                                                                                                                                                                                                                                                                                                    | No                          |
| bidirectionalAudio.enabled    | boolean          | Sends audio over the WebSocket connection back to the call. The default value is `true`.                                                                                                                                                                                                                                                                             | No                          |
| bidirectionalAudio.sampleRate | number           | Is required when streaming is enabled. This parameter specifies the sampling rate of the audio in Hz that is sent over the WebSocket connection back to the call.                                                                                                                                                                                                    | No                          |
| bidirectionalAudio.streaming  | boolean          | Breaks the raw audio into segments (chunks) and sends them over the WebSocket connection back to the call. The audio is expected to be in linear [PCM](https://en.wikipedia.org/wiki/Pulse-code_modulation) format (uncompressed raw audio format without headers). The default value is `false`. This parameter works if `bidirectionalAudio.enabled` is activated. | No                          |
| disableBidirectionalAudio     | boolean          | This parameter is deprecated. If set to `true`, it will disable bidirectional audio, which is equivalent to setting `bidirectionalAudio.enabled = false`. The default value is `false`.                                                                                                                                                                              | No                          |
| earlyMedia                    | boolean          | If `true`, start listening before the call is answered. The default value is `false`.                                                                                                                                                                                                                                                                                | No                          |
| finishOnKey                   | string           | The set of digits that can end the listen action.                                                                                                                                                                                                                                                                                                                    | No                          |
| maxLength                     | number           | The maximum length of the listened audio stream in seconds.                                                                                                                                                                                                                                                                                                          | No                          |
| metadata                      | object           | Arbitrary data to add to the JSON payload sent to the remote server when the WebSocket connection is first established.                                                                                                                                                                                                                                              | No                          |
| mixType                       | string           | The following types can be specified:<br /> - `mono` — sends a single channel,<br /> - `stereo` — sends a dual channel of both calls in a bridge,<br /> - `mixed` — sends audio from both calls in a bridge in a single mixed audio stream. The default value is `mono`.                                                                                             | No                          |
| passDtmf                      | boolean          | If this parameter is `true`, detected DTMF digits will be sent over WebSocket as JSON text frames. The default value is `false`.                                                                                                                                                                                                                                     | No                          |
| playBeep                      | boolean          | Enable a beep sound when the listen operation starts. The default value is `false`.                                                                                                                                                                                                                                                                                  | No                          |
| sampleRate                    | number           | The sample rate of audio to send. Allowable values: `8000`, `16000`, `24000`, `48000`, or `64000`. The default value is `8000`.                                                                                                                                                                                                                                      | No                          |
| timeout                       | number           | The number of seconds of silence that terminates the listen operation.                                                                                                                                                                                                                                                                                               | No                          |
| transcribe                    | object           | A nested [transcribe](/voice-gateway/references/verbs/transcribe) verb.                                                                                                                                                                                                                                                                                              | No                          |
| wsAuth                        | object           | WebSocket authentication credentials object.                                                                                                                                                                                                                                                                                                                         | No                          |
| wsAuth.username               | string           | The HTTP basic auth username to use on the WebSocket connection.                                                                                                                                                                                                                                                                                                     | Yes (if wsAuth is provided) |
| wsAuth.password               | string           | The HTTP basic auth password to use on the WebSocket connection.                                                                                                                                                                                                                                                                                                     | Yes (if wsAuth is provided) |

## Passing DTMF

Any DTMF digits entered by the far end party on the call can optionally be passed to the WebSocket server as JSON text frames by setting the `passDtmf` property to `true`. Each DTMF entry is reported separately in a payload containing the specific DTMF key that was entered, along with its duration reported in RTP timestamp units.

The payload example:

```json theme={null}
{
  "event": "dtmf",
  "dtmf": "2",
  "duration": "1600"
}
```

## Bidirectional Audio

Audio can also be sent back over the WebSocket. This audio, if supplied, will be played out to the caller.

<Warning>
  Bidirectional audio is not supported when the [`listen`](/voice-gateway/references/verbs/listen) is nested in the context of a [`dial`](/voice-gateway/references/verbs/dial) verb.
</Warning>

<Accordion title="Audio Format and Configuration">
  The far-end WebSocket server supplies bidirectional audio by sending a JSON text frame over the WebSocket connection:

  ```json theme={null}
  {
  "type": "playAudio",
  "data": {
  "audioContent": "base64-encoded content..",
  "audioContentType": "raw",
  "sampleRate": "16000"
  }
  }
  ```

  In the example above, raw (headerless) audio is sent. The audio must comply with the standard properties of encoding and format, with a configurable sample rate of either 8000, 16000, 24000, 32000, 48000, or 64000 Hz.

  Alternatively, a `wave` file format can be supplied by using type `wav` (or `wave`), and in this case, no `sampleRate` property is needed. In all cases, the audio must be base64 encoded when sent over the socket.

  If multiple `playAudio` verbs are sent before the first has finished playing, they will be queued and played in order. You may have up to 10 queued `playAudio` verbs at any time.

  Once a `playAudio` verb has finished playing out the audio, a `playDone` JSON text frame will be sent over the WebSocket connection for confirmation.

  ```json theme={null}
  {
  "type": "playDone"
  }
  ```

  A `killAudio` verb can be sent by the WebSocket server to stop the playback of audio that was started via a previous `playAudio` verb:

  ```json theme={null}
  {
  "type": "killAudio"
  }
  ```

  If the WebSocket connection wishes to end the `listen`, it can send a disconnect verb:

  ```json theme={null}
  {
  "type": "disconnect"
  }
  ```
</Accordion>
