`<Stream>`

TL;DR

<Stream> create a bi-directional audio stream to a remotely provided audio WebSocket service.

Need Help? Let's Talk

Join our Discord community - we're here to help.

Description

The <Stream> noun creates a bi-directional audio stream that sends and receives audio data from the active call to a remote audio WebSocket service. Execution of the CXML document will continue, ONLY after the <Connect><Stream> connection has terminated.

Example

The following example shows the basic use of the <Stream> noun:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Connect>
        <Stream url="wss://stream-listener.example.com/my-app"/>
    </Connect>
    <Hangup />
</Response>

Attributes

The following attributes are supported:

Attribute Name	Allowed Values	Default Value
`name`	Any short text	None
`url`	Absolute URL to a WebSocket service	None
`track`	`inbound_track`, `outbound_track`, or `both_tracks`	`inbound_track`
`statusCallback`	Relative or absolute HTTP URL	None
`statusCallbackMethod`	`GET` or `POST`	`POST`

Attribute: `name`

The name attribute specifies a name for this stream that should be unique for the current application. This name is used to allow the stream to be stopped by a <Stop> verb. If multiple <Stream> elements share the same name, only the first such stream will be created and any following stream with the same name will be assumed to refer to the initial identically named stream and will not be recreated.

To stop a stream, use a <Stream> noun with the same name, in the <Stop> verb. For example:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
	<Stop>
    <Stream name="my-stream" />
  </Stop>
	<Say>This will be sent to the stream</Say>
</Response>

If the <Stop> verb is used to stop an unknown stream, the command will be ignored.

Attribute: `url`

The url attribute must be a valid URL to a websocket address. If the URL is not specified or is not a valid WebSocket URL, it is an error and the application will be stopped. If the connection to the WebSocket fails, it is an error and the application will be stopped.

Attribute: `track`

The track attribute specifies which side of the conversation should be sent to the WebSocket: setting inbound_track means to send the audio from the caller (the side of the call that started the application) while setting outbound_track means to send the audio that the application is sending to the caller, or if the application is running a <Dial> command - the audio that the other side of the dial is sending. Setting both_tracks means that audio from both sides is sent to the WebSocket.

Attribute: `statusCallback`

If the statusCallback property is set to a valid HTTP URL, the stream will send HTTP requests to the specified URL when stream events occur. The following parameters will be sent to the status callback URL:

CallSid - the unique call ID associated with the connected call.
Session - the session token for the connected call.
StreamSid - the unique stream identifier for this stream.
StreamName - the stream name, if the name attribute was defined.
StreamEvent - the event that caused the call: one of stream-started, stream-stopped, or stream-error.
StreamError - the error message, if an error has occured.
Timestamp` - the time of the event in ISO 8601 format.

Attribute: `statusCallbackMethod`

The HTTP method to use when calling the statusCallback URL.

The Stream Protocol

This protocol is a compatible implementation with the Twilio Steram WebSocket protocol, and this linked document may be consulted for more details.

The Stream WebSocket protocol allows a WebSocket service to listen to audio on a call, by sending messages to the WebSocket service. Each message is a text messages containing a JSON encoded object (parameter list). For each such message, a parameter named event will specify the type of message.

The following messages are sent from the <Stream> noun:

Connected Message

Whenever the WebSocket connects, the first message that the <Stream> noun sends is the Connected Message, which describes the protocol that will be used on the connection.

It has the following parameters:

Parameter	Description
`event`	The value `connected`
`protocol`	A name describing the "protocol" - i.e. the type of messages to be expected on this connection. Supported values: "`Call`"
`version`	The semantic version of the protocol

Start Message

This message is sent immediately after the connected message and conveys metadata about the stream that has started.

It has the following parameters:

Parameter	Description
`event`	The value `start`
`sequenceNumber`	A sequence number for this message in the protocol. The first message after `connected` will have the sequence number "`1`" and any further message will have a number that is incremented by 1.
`streamSid`	The unique stream identifier for this stream
`start`	An object specifying the metadata for the stream contents, see below.

The start object contains the following parameters:

Parameter	Description
`streamSid`	The unique stream identifier for this stream.
`session`	The session token for the connected call.
`callSid`	The unique call ID associated with the connected call.
`tracks`	An array containing the list of tracks that were requested and for which audio will be sent. Can container either `inbound`, `outbound` or both values.
`customParameters`	An object containing the all the custom parameters for the stream.
`mediaFormat`	An object specifying the format media payload in Media Messages on this connection. See below.

The mediaFormat object contains the following parameters:

Parameter	Description
`encoding`	The audio encoding of media payloads. Supported values: `audio/x-mulaw`
`sampleRate`	The sample rate inHz of audio payloads. Supported values: `8000`
`channels`	The number of channels in audio payloads. supported values: `1`

Stop Message

This message is sent when the stream is either stopped by a <Stop> verb or by the call ending..

It has the following parameters:

Parameter	Description
`event`	The value `stop`
`sequenceNumber`	A sequence number for this message in the protocol. The first message after `connected` will have the sequence number "`1`" and any further message will have a number that is incremented by 1.
`streamSid`	The unique stream identifier for this stream.
`stop`	An object containing the stream metadata. See below.

The stop object contains the following parameters:

Parameter	Description
`session`	The session token for the connected call.
`callSid`	The unique call ID associated with the connected call.

Media Message

This message is sent when audio is received in the call.

It has the following parameters:

Parameter	Description
`event`	The value `media`
`sequenceNumber`	A sequence number for this message in the protocol. The first message after `connected` will have the sequence number "`1`" and any further message will have a number that is incremented by 1.
`streamSid`	The unique stream identifier for this stream.
`media`	An object containing media metadata and payload. See below.

The media object contains the following parameters:

Parameter	Description
`track`	The track this media is for, either `inbound` or `outbound`.
`chunk`	The chunk seequence number for this track. Starts with "`1`" and incremends for each media message for this track.
`timestamp`	The presentation timestamp in milliseconds from the start of the stream.
`payload`	The audio payload for this message as raw audio data encoded in base64.

DTMF Message

This message is sent when the someone presses a touch-tone keypad key in the inbound stream.

It has the following parameters:

Parameter	Description
`event`	The value `dtmf`
`sequenceNumber`	A sequence number for this message in the protocol. The first message after `connected` will have the sequence number "`1`" and any further message will have a number that is incremented by 1.
`streamSid`	The unique stream identifier for this stream.
`dtmf`	An object containing the DTMF data. See below.

The dtmf object contains the following parameters:

Parameter	Description
`track`	The track this DTMF event is for, either `inbound` or `outbound`.
`digit`	The key that was pressed, either `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `0`, `#`, or `*`.

Sample WebSocket Audio Server

The following JAVA scode implements a bi-directional WebSocket servier, that interfaces with Cloudonix's <Connect><Stream> verb and enables bi-directional audio streaming between the caller and the server.

The following code is based on the following dependencies:

package org.app;

import io.vertx.core.AbstractVerticle;
import io.vertx.core.Future;
import io.vertx.core.Promise;
import io.vertx.core.Vertx;
import io.vertx.core.http.HttpServerOptions;
import io.vertx.core.http.WebSocket;
import io.vertx.core.json.JsonObject;
import lombok.extern.slf4j.Slf4j;

import java.nio.ByteBuffer;
import java.time.Duration;
import java.util.Base64;
import java.util.concurrent.atomic.AtomicBoolean;

@Slf4j
public class Main extends AbstractVerticle {

  private static final String AUDIO_INPUT_DIR = "audio/input/";

  public static void main(final String[] args) {
    final var vertx = Vertx.vertx();
    vertx.deployVerticle(new Main());
  }

  @Override
  public void start(final Promise<Void> startPromise) {

    // Create HTTP web server to which <Connect><Stream> will connect
    vertx.createHttpServer(new HttpServerOptions().setPort(8080))
      .requestHandler(request -> request.response().send("Media Server is running!"))
      .webSocketHandler(webSocket -> {

        // Log the incoming WebSocket connection
        log.info("WebSocket connection: {}", webSocket.path());

        // Handle incoming media events
        webSocket.textMessageHandler(text -> {
          // Before streaming media through this WebSocket, first we should find
          // the Stream SID value. For that we will wait for the first media event from <Connect><Stream>.
          // After it is received we can get the Stream SID value and start sending media.
          var event = new JsonObject(text);
          var eventType = event.getString("event");
          var sid = event.getString("streamSid");

          // If Stream SID is found, start sending our media
          if (sid != null) startWriteAudio(webSocket, sid);
        });
      })
      .listen();
  }

  private AtomicBoolean writeAudioStarted = new AtomicBoolean(false);

  private void startWriteAudio(WebSocket webSocket, String streamSid) {
    // Only start sending media once, we don't want to start doing that every time when
    // new media event is received
    var alreadyStarted = !writeAudioStarted.compareAndSet(false, true);
    if (alreadyStarted) return;

    // Read audio file that contains music that we want to play through a WebSocket
    readFile(AUDIO_INPUT_DIR + "music-mulaw.wav")
      // Stream audio directly through a WebSocket
      .compose(audio -> streamAudio(webSocket, streamSid, audio))
      .onSuccess(__ -> log.info("Audio streaming is complete!"));
  }

  private Future<ByteBuffer> readFile(String filename) {
    // Read all the content of the specified file
    return vertx.fileSystem().readFile(filename)
      .map(audio -> ByteBuffer.wrap(audio.getBytes()));
  }

  private Future<Void> streamAudio(WebSocket webSocket, String sid, ByteBuffer audio) {
    // If there is no more audio data left, we should close the WebSocket and notify the Cloudonix server
    // that it should continue executing CXML further.
    if (!audio.hasRemaining()) {
      log.info("Closing the WebSocket. No more audio data left!");
      webSocket.close();
    }

    // How much audio data in bytes we will send in one audio packet
    var audioPacketSize = 600;
    // Allocate byte buffer that will contain that exact number of bytes
    var audioPacket = new byte[audioPacketSize];
    // Read 600 bytes from the buffer that contains our audio
    audio.get(audioPacket, 0, audioPacketSize);

    // Encode the byte data as a Base64 string
    var payload = Base64.getEncoder().encodeToString(audioPacket);
    // Construct the media event containing audio data
    var event = new JsonObject()
      .put("event", "media")
      .put("streamSid", sid)
      .put("media", new JsonObject()
        .put("payload", payload));
    // Send media event through the WebSocket
    webSocket.writeTextMessage(event.encode())
      .onSuccess(__ -> log.info("Sent {} bytes of audio data to {}", audioPacket.length, sid))
      .onFailure(e -> log.error("Failed to write audio data to the WebSocket", e));

    // We're streaming MULAW audio: 8-bit (1 byte), 8000 Hz
    // Meaning every second contains 8000 bytes of audio
    // Therefore every millisecond will contain 8 bytes of audio
    // In order to calculate for how long we should wait before sending the next audio packet:
    // size(audio_packet) / 8 = how many ms we should wait
    return setTimer(Duration.ofMillis(audioPacket.length / 8))
      // Then send the next portion of audio
      .compose(__ -> streamAudio(webSocket, sid, audio));
  }

  private Future<Void> setTimer(Duration duration) {
    var promise = Promise.<Void>promise();
    vertx.setTimer(duration.toMillis(), __ -> promise.complete());
    return promise.future();
  }

  @Override
  public void stop(Promise<Void> stopPromise) {
    log.info("The application has stopped");
  }

}

Description​

Example​

Attributes​

Attribute: name​

Attribute: url​

Attribute: track​

Attribute: statusCallback​

Attribute: statusCallbackMethod​

The Stream Protocol​

Connected Message​

Start Message​

Stop Message​

Media Message​

DTMF Message​

Sample WebSocket Audio Server​

Description

Example

Attributes

Attribute: `name`

Attribute: `url`

Attribute: `track`

Attribute: `statusCallback`

Attribute: `statusCallbackMethod`

The Stream Protocol

Connected Message

Start Message

Stop Message

Media Message

DTMF Message

Sample WebSocket Audio Server