Skip to main content

<Stream>

TL;DR

<Stream> create a bi-directional audio stream to a remotely provided audio WebSocket service.

Need Help? Let's Talk

Join our Discord community - we're here to help.

Description

The <Stream> noun creates a bi-directional audio stream that sends and receives audio data from the active call to a remote audio WebSocket service. Execution of the CXML document will continue, ONLY after the <Connect><Stream> connection has terminated.

Example

The following example shows the basic use of the <Stream> noun:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://stream-listener.example.com/my-app"/>
</Connect>
<Hangup />
</Response>

Attributes

The following attributes are supported:

Attribute NameAllowed ValuesDefault Value
nameAny short textNone
urlAbsolute URL to a WebSocket serviceNone
trackinbound_track, outbound_track, or both_tracksinbound_track
statusCallbackRelative or absolute HTTP URLNone
statusCallbackMethodGET or POSTPOST

Attribute: name

The name attribute specifies a name for this stream that should be unique for the current application. This name is used to allow the stream to be stopped by a <Stop> verb. If multiple <Stream> elements share the same name, only the first such stream will be created and any following stream with the same name will be assumed to refer to the initial identically named stream and will not be recreated.

To stop a stream, use a <Stream> noun with the same name, in the <Stop> verb. For example:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Stop>
<Stream name="my-stream" />
</Stop>
<Say>This will be sent to the stream</Say>
</Response>

If the <Stop> verb is used to stop an unknown stream, the command will be ignored.

Attribute: url

The url attribute must be a valid URL to a websocket address. If the URL is not specified or is not a valid WebSocket URL, it is an error and the application will be stopped. If the connection to the WebSocket fails, it is an error and the application will be stopped.

Attribute: track

The track attribute specifies which side of the conversation should be sent to the WebSocket: setting inbound_track means to send the audio from the caller (the side of the call that started the application) while setting outbound_track means to send the audio that the application is sending to the caller, or if the application is running a <Dial> command - the audio that the other side of the dial is sending. Setting both_tracks means that audio from both sides is sent to the WebSocket.

Attribute: statusCallback

If the statusCallback property is set to a valid HTTP URL, the stream will send HTTP requests to the specified URL when stream events occur. The following parameters will be sent to the status callback URL:

  • CallSid - the unique call ID associated with the connected call.
  • Session - the session token for the connected call.
  • StreamSid - the unique stream identifier for this stream.
  • StreamName - the stream name, if the name attribute was defined.
  • StreamEvent - the event that caused the call: one of stream-started, stream-stopped, or stream-error.
  • StreamError - the error message, if an error has occured.
  • Timestamp` - the time of the event in ISO 8601 format.

Attribute: statusCallbackMethod

The HTTP method to use when calling the statusCallback URL.

The Stream Protocol

This protocol is a compatible implementation with the Twilio Steram WebSocket protocol, and this linked document may be consulted for more details.

The Stream WebSocket protocol allows a WebSocket service to listen to audio on a call, by sending messages to the WebSocket service. Each message is a text messages containing a JSON encoded object (parameter list). For each such message, a parameter named event will specify the type of message.

The following messages are sent from the <Stream> noun:

Connected Message

Whenever the WebSocket connects, the first message that the <Stream> noun sends is the Connected Message, which describes the protocol that will be used on the connection.

It has the following parameters:

ParameterDescription
eventThe value connected
protocolA name describing the "protocol" - i.e. the type of messages to be expected on this connection. Supported values: "Call"
versionThe semantic version of the protocol

Start Message

This message is sent immediately after the connected message and conveys metadata about the stream that has started.

It has the following parameters:

ParameterDescription
eventThe value start
sequenceNumberA sequence number for this message in the protocol. The first message after connected will have the sequence number "1" and any further message will have a number that is incremented by 1.
streamSidThe unique stream identifier for this stream
startAn object specifying the metadata for the stream contents, see below.

The start object contains the following parameters:

ParameterDescription
streamSidThe unique stream identifier for this stream.
sessionThe session token for the connected call.
callSidThe unique call ID associated with the connected call.
tracksAn array containing the list of tracks that were requested and for which audio will be sent. Can container either inbound, outbound or both values.
customParametersAn object containing the all the custom parameters for the stream.
mediaFormatAn object specifying the format media payload in Media Messages on this connection. See below.

The mediaFormat object contains the following parameters:

ParameterDescription
encodingThe audio encoding of media payloads. Supported values: audio/x-mulaw
sampleRateThe sample rate inHz of audio payloads. Supported values: 8000
channelsThe number of channels in audio payloads. supported values: 1

Stop Message

This message is sent when the stream is either stopped by a <Stop> verb or by the call ending..

It has the following parameters:

ParameterDescription
eventThe value stop
sequenceNumberA sequence number for this message in the protocol. The first message after connected will have the sequence number "1" and any further message will have a number that is incremented by 1.
streamSidThe unique stream identifier for this stream.
stopAn object containing the stream metadata. See below.

The stop object contains the following parameters:

ParameterDescription
sessionThe session token for the connected call.
callSidThe unique call ID associated with the connected call.

Media Message

This message is sent when audio is received in the call.

It has the following parameters:

ParameterDescription
eventThe value media
sequenceNumberA sequence number for this message in the protocol. The first message after connected will have the sequence number "1" and any further message will have a number that is incremented by 1.
streamSidThe unique stream identifier for this stream.
mediaAn object containing media metadata and payload. See below.

The media object contains the following parameters:

ParameterDescription
trackThe track this media is for, either inbound or outbound.
chunkThe chunk seequence number for this track. Starts with "1" and incremends for each media message for this track.
timestampThe presentation timestamp in milliseconds from the start of the stream.
payloadThe audio payload for this message as raw audio data encoded in base64.

DTMF Message

This message is sent when the someone presses a touch-tone keypad key in the inbound stream.

It has the following parameters:

ParameterDescription
eventThe value dtmf
sequenceNumberA sequence number for this message in the protocol. The first message after connected will have the sequence number "1" and any further message will have a number that is incremented by 1.
streamSidThe unique stream identifier for this stream.
dtmfAn object containing the DTMF data. See below.

The dtmf object contains the following parameters:

ParameterDescription
trackThe track this DTMF event is for, either inbound or outbound.
digitThe key that was pressed, either 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, #, or *.

Sample WebSocket Audio Server

The following JAVA scode implements a bi-directional WebSocket servier, that interfaces with Cloudonix's <Connect><Stream> verb and enables bi-directional audio streaming between the caller and the server.

The following code is based on the following dependencies:

package org.app;

import io.vertx.core.AbstractVerticle;
import io.vertx.core.Future;
import io.vertx.core.Promise;
import io.vertx.core.Vertx;
import io.vertx.core.http.HttpServerOptions;
import io.vertx.core.http.WebSocket;
import io.vertx.core.json.JsonObject;
import lombok.extern.slf4j.Slf4j;

import java.nio.ByteBuffer;
import java.time.Duration;
import java.util.Base64;
import java.util.concurrent.atomic.AtomicBoolean;

@Slf4j
public class Main extends AbstractVerticle {

private static final String AUDIO_INPUT_DIR = "audio/input/";

public static void main(final String[] args) {
final var vertx = Vertx.vertx();
vertx.deployVerticle(new Main());
}

@Override
public void start(final Promise<Void> startPromise) {

// Create HTTP web server to which <Connect><Stream> will connect
vertx.createHttpServer(new HttpServerOptions().setPort(8080))
.requestHandler(request -> request.response().send("Media Server is running!"))
.webSocketHandler(webSocket -> {

// Log the incoming WebSocket connection
log.info("WebSocket connection: {}", webSocket.path());

// Handle incoming media events
webSocket.textMessageHandler(text -> {
// Before streaming media through this WebSocket, first we should find
// the Stream SID value. For that we will wait for the first media event from <Connect><Stream>.
// After it is received we can get the Stream SID value and start sending media.
var event = new JsonObject(text);
var eventType = event.getString("event");
var sid = event.getString("streamSid");

// If Stream SID is found, start sending our media
if (sid != null) startWriteAudio(webSocket, sid);
});
})
.listen();
}

private AtomicBoolean writeAudioStarted = new AtomicBoolean(false);

private void startWriteAudio(WebSocket webSocket, String streamSid) {
// Only start sending media once, we don't want to start doing that every time when
// new media event is received
var alreadyStarted = !writeAudioStarted.compareAndSet(false, true);
if (alreadyStarted) return;

// Read audio file that contains music that we want to play through a WebSocket
readFile(AUDIO_INPUT_DIR + "music-mulaw.wav")
// Stream audio directly through a WebSocket
.compose(audio -> streamAudio(webSocket, streamSid, audio))
.onSuccess(__ -> log.info("Audio streaming is complete!"));
}

private Future<ByteBuffer> readFile(String filename) {
// Read all the content of the specified file
return vertx.fileSystem().readFile(filename)
.map(audio -> ByteBuffer.wrap(audio.getBytes()));
}

private Future<Void> streamAudio(WebSocket webSocket, String sid, ByteBuffer audio) {
// If there is no more audio data left, we should close the WebSocket and notify the Cloudonix server
// that it should continue executing CXML further.
if (!audio.hasRemaining()) {
log.info("Closing the WebSocket. No more audio data left!");
webSocket.close();
}

// How much audio data in bytes we will send in one audio packet
var audioPacketSize = 600;
// Allocate byte buffer that will contain that exact number of bytes
var audioPacket = new byte[audioPacketSize];
// Read 600 bytes from the buffer that contains our audio
audio.get(audioPacket, 0, audioPacketSize);

// Encode the byte data as a Base64 string
var payload = Base64.getEncoder().encodeToString(audioPacket);
// Construct the media event containing audio data
var event = new JsonObject()
.put("event", "media")
.put("streamSid", sid)
.put("media", new JsonObject()
.put("payload", payload));
// Send media event through the WebSocket
webSocket.writeTextMessage(event.encode())
.onSuccess(__ -> log.info("Sent {} bytes of audio data to {}", audioPacket.length, sid))
.onFailure(e -> log.error("Failed to write audio data to the WebSocket", e));

// We're streaming MULAW audio: 8-bit (1 byte), 8000 Hz
// Meaning every second contains 8000 bytes of audio
// Therefore every millisecond will contain 8 bytes of audio
// In order to calculate for how long we should wait before sending the next audio packet:
// size(audio_packet) / 8 = how many ms we should wait
return setTimer(Duration.ofMillis(audioPacket.length / 8))
// Then send the next portion of audio
.compose(__ -> streamAudio(webSocket, sid, audio));
}

private Future<Void> setTimer(Duration duration) {
var promise = Promise.<Void>promise();
vertx.setTimer(duration.toMillis(), __ -> promise.complete());
return promise.future();
}

@Override
public void stop(Promise<Void> stopPromise) {
log.info("The application has stopped");
}

}