<Converse>
<Converse>
with an AI Voice Agent.
Join our Discord community - we're here to help.
Description
Use an AI to converse with a human caller, using LLM (Large Language Models), STT (Speech-To-Text) and TTS (Text-To-Speech).
<Converse>
may be used as a stand-alone verb (eg. prompt an LLM and get back a response to playback), or as a nested verb
on conjunction with the <Gather>
verb - for creating voice agents. <Converse>
provides direct access to LLM, STT and TTS
capabilities, without requiring the developer to integrate their own tools. <Converse>
is designed as a pipe between
these, enabling a seamless flow of data between these.
Examples
Example 1: Stand-alone prompting
<Response>
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are helpful hotel receptionist, providing a customer with helpful information about the Acme Hotel.
</System>
</Converse>
</Response>
Example 2: Voice agent prompting
<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are helpful hotel receptionist, providing a customer with helpful information about the Acme Hotel.
</System>
<Speech/>
</Converse>
</Gather>
</Response>
Attributes
The following attributes are supported:
Attribute Name | Allowed Values | Default Value |
---|---|---|
voice | man , woman or See Premium Voices | woman |
language | See Premium Voices | en-US |
statusCallback | URL | none |
statusCallbackMethod | POST or GET | POST |
statusCallbackEvent | in-progress , tool-response , llm-response , completed | in-progress,completed |
sessionTools | hangup , redirect , dial | none |
model | LLM Support - See below | LLM Support - See below |
context | auto , none , no | auto |
temperature | Temprature - See below | 1 |
Attribute: voice
Which voice model to use for generating the synthesized voice. Additional models may be offered in the future.
Attribute: language
In which language, of those supported, to generate the speech in. The language is a hint to the speech syntehsizer, where the text must actually be written in the specified language - no translation will be done on the text before performing speech synthesis.
Attribute: statusCallback
A URL to be called when the audio output has completed playing. This URL will be called with all the parameters of a standard CXML request, but its output is discarded.
Attribute: statusCallbackMethod
The HTTP method to use for the statusCallback
URL.
Attribute: statusCallbackEvent
Which events should be reported back to the statusCallback
URL.
Attribute: sessionTools
Clouodnix includes 3 built-in session tools, available for your agent to use. These are hangup
, redirect
and dial
. To use these, you MUST first declare your desire to use these, as a CXML paramter.
Attribute: model
Cloudonix provides direct over-the-top access to the following LLM providers: OpenAI and Anthropic.
Specifying a specific LLM model to be used is performed using the model
parameter, where its value is provided based
upon the following format: provider[:model]
.
LLM Provider | prefix for model value |
---|---|
OpenAI | openai or chatgpt |
Anthropic | anthropic or claude |
For OpenAI, you may use any of the following GPT Models.
For Anthropic, you may use any of the following Claude Models
The model
parameter may also be provided as the provider name only. In such a case, Cloudonix will default
to openai:gpt-4o-mini
or anthropic:claude-3-5-haiku-latest
, based upon the provider name specified. If no provider
name or model were specified, Cloudonix will default to openai:gpt-4o-mini
as its default LLM model.
Here are some example of what to put in the model
attribute to access certain LLM models:
LLM Provider | Model name | model value |
---|---|---|
OpenAI | gpt-4o | openai:gpt-4o |
OpenAI | gpt-4o-mini | openai:gpt-4o-mini |
OpenAI | gpt-o1 | openai:gpt-o1 |
OpenAI | gpt-o1-mini | openai:o1-mini |
OpenAI | gpt-o1-preview | openai:o1-preview |
Anthropic | claude-3-7-sonnet-latest | anthropic:claude-3-7-sonnet-latest |
Anthropic | claude-3-5-haiku-latest | anthropic:claude-3-5-haiku-latest |
Attribute: context
By default, Cloudonix sets the context
value to auto
. Context support enables a chat history function, that will ensure
the LLM will receive the previous interactions as a part of the LLM request. For more information about this feature, we suggest
that you read OpenAI's information about context window.
Attribute: temperature
As defined by OpenAI and adopted by other LLM platforms:
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more
random, while lower values like 0.2 will make it more focused and deterministic. If set to 0,
the model will use log probability to automatically increase the temperature until certain
thresholds are hit.
Available Nouns
<Description>
<Description
> is an optional noun, providing a text description for your LLM tool.
Example
<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<Tool name="simpleTool" url="https://example.com/simpleTool">
<Description>Just a simple tool description</Description>
</Tool>
<System>
Just a simple System prompt.
</System>
<Speech/>
</Converse>
</Gather>
</Response>
<System>
Description
Pass a System prompt to the selected LLM.
You may define several <System>
prompts - however, as some LLM provider may not support multiple system prompts
in a single API call, multiple prompts may be merged into a single system prompt.
Example
<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are a helpful voice agent, designed to help the user in any way possible.
</System>
<Speech/>
</Converse>
</Gather>
</Response>
Attributes
No attributes are available for this noun.
<Tool>
See Tool noun
<User>
Description
Pass a User prompt to the selected LLM.
Example
<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are a helpful voice agent, designed to help the user in any way possible.
</System>
<User>Hello, my name is Johnny Five - can you help me?</User>
<Speech/>
</Converse>
</Gather>
</Response>
Attributes
No attributes are available for this noun.
<Speech />
Pass the caller's verbal response as a User prompt to the LLM, indicated by the <Converse>
verb.
This noun is used in conjunction with the <Gather>
. When nesting <Converse>
within <Gather>
.
Important: The <Speech />
noun MUST be the last noun in your <Converse>
verb block.
Attributes
No attributes are available for this noun.