Skip to main content

<Converse>

TL;DR

<Converse> with an AI Voice Agent.

Need Help? Let's Talk

Join our Discord community - we're here to help.

Description

Use an AI to converse with a human caller, using LLM (Large Language Models), STT (Speech-To-Text) and TTS (Text-To-Speech).

<Converse> may be used as a stand-alone verb (eg. prompt an LLM and get back a response to playback), or as a nested verb on conjunction with the <Gather> verb - for creating voice agents. <Converse> provides direct access to LLM, STT and TTS capabilities, without requiring the developer to integrate their own tools. <Converse> is designed as a pipe between these, enabling a seamless flow of data between these.

Examples

Example 1: Stand-alone prompting

<Response>
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are helpful hotel receptionist, providing a customer with helpful information about the Acme Hotel.
</System>
</Converse>
</Response>

Example 2: Voice agent prompting

<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are helpful hotel receptionist, providing a customer with helpful information about the Acme Hotel.
</System>
<Speech/>
</Converse>
</Gather>
</Response>

Attributes

The following attributes are supported:

Attribute NameAllowed ValuesDefault Value
voiceman, woman or See Premium Voiceswoman
languageSee Premium Voicesen-US
statusCallbackURLnone
statusCallbackMethodPOST or GETPOST
statusCallbackEventin-progress, tool-response, llm-response, completedin-progress,completed
sessionToolshangup, redirect, dialnone
modelLLM Support - See belowLLM Support - See below
contextauto, none, noauto
temperatureTemprature - See below1

Attribute: voice

Which voice model to use for generating the synthesized voice. Additional models may be offered in the future.

Attribute: language

In which language, of those supported, to generate the speech in. The language is a hint to the speech syntehsizer, where the text must actually be written in the specified language - no translation will be done on the text before performing speech synthesis.

Attribute: statusCallback

A URL to be called when the audio output has completed playing. This URL will be called with all the parameters of a standard CXML request, but its output is discarded.

Attribute: statusCallbackMethod

The HTTP method to use for the statusCallback URL.

Attribute: statusCallbackEvent

Which events should be reported back to the statusCallback URL.

Attribute: sessionTools

Clouodnix includes 3 built-in session tools, available for your agent to use. These are hangup, redirect and dial. To use these, you MUST first declare your desire to use these, as a CXML paramter.

Attribute: model

Cloudonix provides direct over-the-top access to the following LLM providers: OpenAI and Anthropic. Specifying a specific LLM model to be used is performed using the model parameter, where its value is provided based upon the following format: provider[:model].

LLM Providerprefix for model value
OpenAIopenai or chatgpt
Anthropicanthropic or claude
Available Language Models

For OpenAI, you may use any of the following GPT Models.
For Anthropic, you may use any of the following Claude Models

The model parameter may also be provided as the provider name only. In such a case, Cloudonix will default to openai:gpt-4o-mini or anthropic:claude-3-5-haiku-latest, based upon the provider name specified. If no provider name or model were specified, Cloudonix will default to openai:gpt-4o-mini as its default LLM model.

Here are some example of what to put in the model attribute to access certain LLM models:

LLM ProviderModel namemodel value
OpenAIgpt-4oopenai:gpt-4o
OpenAIgpt-4o-miniopenai:gpt-4o-mini
OpenAIgpt-o1openai:gpt-o1
OpenAIgpt-o1-miniopenai:o1-mini
OpenAIgpt-o1-previewopenai:o1-preview
Anthropicclaude-3-7-sonnet-latestanthropic:claude-3-7-sonnet-latest
Anthropicclaude-3-5-haiku-latestanthropic:claude-3-5-haiku-latest

Attribute: context

By default, Cloudonix sets the context value to auto. Context support enables a chat history function, that will ensure the LLM will receive the previous interactions as a part of the LLM request. For more information about this feature, we suggest that you read OpenAI's information about context window.

Attribute: temperature

As defined by OpenAI and adopted by other LLM platforms:

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more
random, while lower values like 0.2 will make it more focused and deterministic. If set to 0,
the model will use log probability to automatically increase the temperature until certain
thresholds are hit.

Available Nouns

<Description>

<Description> is an optional noun, providing a text description for your LLM tool.

Example

<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<Tool name="simpleTool" url="https://example.com/simpleTool">
<Description>Just a simple tool description</Description>
</Tool>
<System>
Just a simple System prompt.
</System>
<Speech/>
</Converse>
</Gather>
</Response>

<System>

Description

Pass a System prompt to the selected LLM.

Pay Attention!

You may define several <System> prompts - however, as some LLM provider may not support multiple system prompts in a single API call, multiple prompts may be merged into a single system prompt.

Example
<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are a helpful voice agent, designed to help the user in any way possible.
</System>
<Speech/>
</Converse>
</Gather>
</Response>
Attributes

No attributes are available for this noun.

<Tool>

See Tool noun

<User>

Description

Pass a User prompt to the selected LLM.

Example

<Response>
<Gather input="speech" speechEngine="google" actionOnEmptyResult="true" speechTimeout="1.5" speechDetection="stt">
<Converse voice="Google:en-GB-Standard-A" language="en-GB" sessionTools="redirect dial">
<System>
You are a helpful voice agent, designed to help the user in any way possible.
</System>
<User>Hello, my name is Johnny Five - can you help me?</User>
<Speech/>
</Converse>
</Gather>
</Response>

Attributes

No attributes are available for this noun.

<Speech />

Pass the caller's verbal response as a User prompt to the LLM, indicated by the <Converse> verb.

Pay Attention!!!

This noun is used in conjunction with the <Gather>. When nesting <Converse> within <Gather>.
Important: The <Speech /> noun MUST be the last noun in your <Converse> verb block.

Attributes

No attributes are available for this noun.