<Stream>
<Stream>
create a bi-directional audio stream to a remotely provided audio WebSocket service.
Join our Discord community - we're here to help.
Description
The <Stream>
noun creates a bi-directional audio stream that sends and receives audio data from the active call to a remote audio WebSocket service.
Execution of the CXML document will continue, ONLY after the <Connect><Stream>
connection has terminated.
Example
The following example shows the basic use of the <Stream>
noun:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://stream-listener.example.com/my-app"/>
</Connect>
<Hangup />
</Response>
Attributes
The following attributes are supported:
Attribute Name | Allowed Values | Default Value |
---|---|---|
name | Any short text | None |
url | Absolute URL to a WebSocket service | None |
track | inbound_track , outbound_track , or both_tracks | inbound_track |
statusCallback | Relative or absolute HTTP URL | None |
statusCallbackMethod | GET or POST | POST |
Attribute: name
The name
attribute specifies a name for this stream that should be unique for the current application. This name is used to allow the stream to be stopped by a <Stop>
verb. If multiple <Stream>
elements share the same name, only the first such stream will be created and any following stream with the same name will be assumed to refer to the initial identically named stream and will not be recreated.
To stop a stream, use a <Stream>
noun with the same name, in the <Stop>
verb. For example:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Stop>
<Stream name="my-stream" />
</Stop>
<Say>This will be sent to the stream</Say>
</Response>
If the <Stop>
verb is used to stop an unknown stream, the command will be ignored.
Attribute: url
The url
attribute must be a valid URL to a websocket address. If the URL is not specified or is not a valid WebSocket URL, it is an error and the application will be stopped. If the connection to the WebSocket fails, it is an error and the application will be stopped.
Attribute: track
The track
attribute specifies which side of the conversation should be sent to the WebSocket: setting inbound_track
means to send the audio from the caller (the side of the call that started the application) while setting outbound_track
means to send the audio that the application is sending to the caller, or if the application is running a <Dial>
command - the audio that the other side of the dial is sending. Setting both_tracks
means that audio from both sides is sent to the WebSocket.
Attribute: statusCallback
If the statusCallback
property is set to a valid HTTP URL, the stream will send HTTP requests to the specified URL when stream events occur. The following parameters will be sent to the status callback URL:
CallSid
- the unique call ID associated with the connected call.Session
- the session token for the connected call.StreamSid
- the unique stream identifier for this stream.StreamName
- the stream name, if thename
attribute was defined.StreamEvent
- the event that caused the call: one ofstream-started
,stream-stopped
, orstream-error
.StreamError
- the error message, if an error has occured.- Timestamp` - the time of the event in ISO 8601 format.
Attribute: statusCallbackMethod
The HTTP method to use when calling the statusCallback
URL.
The Stream Protocol
This protocol is a compatible implementation with the Twilio Steram WebSocket protocol, and this linked document may be consulted for more details.
The Stream WebSocket protocol allows a WebSocket service to listen to audio on a call, by sending messages to the WebSocket service. Each message is a text messages containing a JSON encoded object (parameter list). For each such message, a parameter named event
will specify the type of message.
The following messages are sent from the <Stream>
noun:
Connected Message
Whenever the WebSocket connects, the first message that the <Stream>
noun sends is the Connected Message, which describes the protocol that will be used on the connection.
It has the following parameters:
Parameter | Description |
---|---|
event | The value connected |
protocol | A name describing the "protocol" - i.e. the type of messages to be expected on this connection. Supported values: "Call " |
version | The semantic version of the protocol |
Start Message
This message is sent immediately after the connected
message and conveys metadata about the stream that has started.
It has the following parameters:
Parameter | Description |
---|---|
event | The value start |
sequenceNumber | A sequence number for this message in the protocol. The first message after connected will have the sequence number "1 " and any further message will have a number that is incremented by 1. |
streamSid | The unique stream identifier for this stream |
start | An object specifying the metadata for the stream contents, see below. |
The start
object contains the following parameters:
Parameter | Description |
---|---|
streamSid | The unique stream identifier for this stream. |
session | The session token for the connected call. |
callSid | The unique call ID associated with the connected call. |
tracks | An array containing the list of tracks that were requested and for which audio will be sent. Can container either inbound , outbound or both values. |
customParameters | An object containing the all the custom parameters for the stream. |
mediaFormat | An object specifying the format media payload in Media Messages on this connection. See below. |
The mediaFormat
object contains the following parameters:
Parameter | Description |
---|---|
encoding | The audio encoding of media payloads. Supported values: audio/x-mulaw |
sampleRate | The sample rate inHz of audio payloads. Supported values: 8000 |
channels | The number of channels in audio payloads. supported values: 1 |
Stop Message
This message is sent when the stream is either stopped by a <Stop>
verb or by the call ending..
It has the following parameters:
Parameter | Description |
---|---|
event | The value stop |
sequenceNumber | A sequence number for this message in the protocol. The first message after connected will have the sequence number "1 " and any further message will have a number that is incremented by 1. |
streamSid | The unique stream identifier for this stream. |
stop | An object containing the stream metadata. See below. |
The stop
object contains the following parameters:
Parameter | Description |
---|---|
session | The session token for the connected call. |
callSid | The unique call ID associated with the connected call. |
Media Message
This message is sent when audio is received in the call.
It has the following parameters:
Parameter | Description |
---|---|
event | The value media |
sequenceNumber | A sequence number for this message in the protocol. The first message after connected will have the sequence number "1 " and any further message will have a number that is incremented by 1. |
streamSid | The unique stream identifier for this stream. |
media | An object containing media metadata and payload. See below. |
The media
object contains the following parameters:
Parameter | Description |
---|---|
track | The track this media is for, either inbound or outbound . |
chunk | The chunk seequence number for this track. Starts with "1 " and incremends for each media message for this track. |
timestamp | The presentation timestamp in milliseconds from the start of the stream. |
payload | The audio payload for this message as raw audio data encoded in base64. |
DTMF Message
This message is sent when the someone presses a touch-tone keypad key in the inbound stream.
It has the following parameters:
Parameter | Description |
---|---|
event | The value dtmf |
sequenceNumber | A sequence number for this message in the protocol. The first message after connected will have the sequence number "1 " and any further message will have a number that is incremented by 1. |
streamSid | The unique stream identifier for this stream. |
dtmf | An object containing the DTMF data. See below. |
The dtmf
object contains the following parameters:
Parameter | Description |
---|---|
track | The track this DTMF event is for, either inbound or outbound . |
digit | The key that was pressed, either 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 0 , # , or * . |
Sample WebSocket Audio Server
The following JAVA scode implements a bi-directional WebSocket servier, that interfaces with Cloudonix's <Connect><Stream>
verb and enables bi-directional audio streaming between the caller and the server.
The following code is based on the following dependencies:
package org.app;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.Future;
import io.vertx.core.Promise;
import io.vertx.core.Vertx;
import io.vertx.core.http.HttpServerOptions;
import io.vertx.core.http.WebSocket;
import io.vertx.core.json.JsonObject;
import lombok.extern.slf4j.Slf4j;
import java.nio.ByteBuffer;
import java.time.Duration;
import java.util.Base64;
import java.util.concurrent.atomic.AtomicBoolean;
@Slf4j
public class Main extends AbstractVerticle {
private static final String AUDIO_INPUT_DIR = "audio/input/";
public static void main(final String[] args) {
final var vertx = Vertx.vertx();
vertx.deployVerticle(new Main());
}
@Override
public void start(final Promise<Void> startPromise) {
// Create HTTP web server to which <Connect><Stream> will connect
vertx.createHttpServer(new HttpServerOptions().setPort(8080))
.requestHandler(request -> request.response().send("Media Server is running!"))
.webSocketHandler(webSocket -> {
// Log the incoming WebSocket connection
log.info("WebSocket connection: {}", webSocket.path());
// Handle incoming media events
webSocket.textMessageHandler(text -> {
// Before streaming media through this WebSocket, first we should find
// the Stream SID value. For that we will wait for the first media event from <Connect><Stream>.
// After it is received we can get the Stream SID value and start sending media.
var event = new JsonObject(text);
var eventType = event.getString("event");
var sid = event.getString("streamSid");
// If Stream SID is found, start sending our media
if (sid != null) startWriteAudio(webSocket, sid);
});
})
.listen();
}
private AtomicBoolean writeAudioStarted = new AtomicBoolean(false);
private void startWriteAudio(WebSocket webSocket, String streamSid) {
// Only start sending media once, we don't want to start doing that every time when
// new media event is received
var alreadyStarted = !writeAudioStarted.compareAndSet(false, true);
if (alreadyStarted) return;
// Read audio file that contains music that we want to play through a WebSocket
readFile(AUDIO_INPUT_DIR + "music-mulaw.wav")
// Stream audio directly through a WebSocket
.compose(audio -> streamAudio(webSocket, streamSid, audio))
.onSuccess(__ -> log.info("Audio streaming is complete!"));
}
private Future<ByteBuffer> readFile(String filename) {
// Read all the content of the specified file
return vertx.fileSystem().readFile(filename)
.map(audio -> ByteBuffer.wrap(audio.getBytes()));
}
private Future<Void> streamAudio(WebSocket webSocket, String sid, ByteBuffer audio) {
// If there is no more audio data left, we should close the WebSocket and notify the Cloudonix server
// that it should continue executing CXML further.
if (!audio.hasRemaining()) {
log.info("Closing the WebSocket. No more audio data left!");
webSocket.close();
}
// How much audio data in bytes we will send in one audio packet
var audioPacketSize = 600;
// Allocate byte buffer that will contain that exact number of bytes
var audioPacket = new byte[audioPacketSize];
// Read 600 bytes from the buffer that contains our audio
audio.get(audioPacket, 0, audioPacketSize);
// Encode the byte data as a Base64 string
var payload = Base64.getEncoder().encodeToString(audioPacket);
// Construct the media event containing audio data
var event = new JsonObject()
.put("event", "media")
.put("streamSid", sid)
.put("media", new JsonObject()
.put("payload", payload));
// Send media event through the WebSocket
webSocket.writeTextMessage(event.encode())
.onSuccess(__ -> log.info("Sent {} bytes of audio data to {}", audioPacket.length, sid))
.onFailure(e -> log.error("Failed to write audio data to the WebSocket", e));
// We're streaming MULAW audio: 8-bit (1 byte), 8000 Hz
// Meaning every second contains 8000 bytes of audio
// Therefore every millisecond will contain 8 bytes of audio
// In order to calculate for how long we should wait before sending the next audio packet:
// size(audio_packet) / 8 = how many ms we should wait
return setTimer(Duration.ofMillis(audioPacket.length / 8))
// Then send the next portion of audio
.compose(__ -> streamAudio(webSocket, sid, audio));
}
private Future<Void> setTimer(Duration duration) {
var promise = Promise.<Void>promise();
vertx.setTimer(duration.toMillis(), __ -> promise.complete());
return promise.future();
}
@Override
public void stop(Promise<Void> stopPromise) {
log.info("The application has stopped");
}
}