Web Responders

Updated at July 27th, 2024

Billable Support Feature

If required, Professional Services are available based on an accepted quote and availability of resources. 

 

 

Web Responders

What are Web Responders?

Web Responders (To Web) is an available responder application. It can be invoked for both incoming and outgoing calls.

The process generally looks like this: (1) contact support to create a dial translation in the Core server and then (2) use or host code that will respond and drive the call.

 

Things to Remember

Remember the following concepts when working with Web Responders:

  • In Dial Translations, the Responder Application is called To-Web. The To-Web application first takes a single parameter: the URL of the Web Responder application. When the To-Web application is invoked, then IVR control is passed to the Web Responder application at the given URL.
  • Whether the call is being forwarded on-net or off-net, you need to supply the “uid” which is the “owner” of the call. Optionally, you can also supply both “origination” and “destination," which are also passed through the dial plan.
  • For all Web Responder verbs to work, we recommend using the logic "forward to user" rather than "forward to web responder". The call flow should be routed through a user or system user. You can direct inbound calls to a Web Responder application by routing them to a system user and then forwarding them to a dial plan entry that selects the "To Web" application. You can place an outbound call by dispatching an API call that will connect the remote end with that Web Responder, like so (the user in this example is "999@domain"):
object=call&action=call&callID=xxx&uid=999@domain.com&destination=999@domain.com&origination=6020001111
 

Example Scenario

This is an example of a "math" application. The application prompts the caller for two numbers and says the sum. The application demonstrates (a) a multi-step application and (b) storing state in a server-side php session. For this application to function, you must also create a file in http://localhost/tts.php and set up an account with a TTS service.

URL Parameters Provided 

When the Core Module posts back to the Web Responder application, the URL may contain the following parameters (regardless if it is inbound or outbound):

URL Parameter Example
NmsAni caller (e.g. "1001")
NmsDnis callee (e.g. "2125551212")
AccountUser the account user
AccountDomain the account domain
AccountLastDial last dialled digits by the account user
Digits received digits
OrigCallID call ID of the Orig leg
TermCallID call ID of the Term leg
ToUser the user input to the responder
ToDomain the domain input to the responder

Verbs

Verbs usually take an "action" attribute. If the "action" attribute is present, then control is returned to the URL given by the "action" attribute. If not present, then the application ends.

 

Forward

The <Forward> action forwards to the given destination. The <Forward> verb does NOT take an "action" parameter. It is effectively a "goto".

 <Forward>  
  2125551212  
 </Forward>
Forward Attribute Description
ByCaller Specifies using the Caller's Dial Plan (instead of the Callee's) in the Web Responder script. Without this attribute, the inbound call has to route through a user first to use their dial plan, and the destination isn't as expected.


Here are ByCaller instructions:

  1. Add the "ByCaller" attribute to Forward.
root@[redacted]:/var/www/html# cat forward.php
<Forward ByCaller='yes'>+141625[redacted]</Forward>
  1. User sets forward to web responder.
  2. The caller dials directly into the web responder. The forward will work as expected.
CNmsUaSessionStateMsgInfoTest(20230130154940025425)('ByCaller','exist') - MsgInfo 
<ByCaller> exist    
LookupResponder(sip:ByCaller@Forward)    
- Found mode=(forward_call)    
LookupNextResponder case(1)    
Translate        
from 
    <Forward@>        
by   (ByCaller)(Forward)        
to   (sip:ByCaller@Forward)(sip:ByCaller@Forward)

Gather

The <Gather> action gathers the given number of DTMF digits and posts back to the given action URL, optionally playing the given .wav file.

In this example, it gathers 3 digits and posts back the relative URL "handle-account-number.php?Digits=555":

<Gather numDigits='3' action='handle-account-number.php'>
    <Play>
        <http://www.example.com/what-is-your-account-number.wav>
        </Play>
    </Gather>
Gather Attribute Description
numDigits This is the number of digits to gather (such as '1') (default=20)
Digits The verb posts back to the given action URL with the gathered digits (such as 123).

Play

The <Play> action plays the given .wav file and posts back (returns control) to the given action URL (must be in WAV format).

In this example, it plays the WAV file and then posts back to continue .php. To end the call, omit the action parameter:

 <Play action='continue.php'>  
  <http://www.example.com/hello-world.wav>  
 </Play>

Supported Verbs 

The following verbs have either been added or been given enhanced functionality starting in v44 No functionality has been removed from verbs. This release is a feature enhancement; your existing verbs will not break.

A Voice Services integration is required for these enhancements to work as expected. Deepgram is the recommended vendor in order to utilize all available voice attributes.

 
 

Response

The <Response> verb is designed to enhance the efficiency and structure of XML responses. This verb simplifies call control by allowing multiple verbs to be encapsulated within a single <Response> element, eliminating the need for multiple requests to perform multiple sequential actions.

Previously, executing multiple verbs required individual requests. With the introduction of the <Response> verb, developers can streamline call flows by listing multiple verbs under a single request. As the call progresses, the Core Module will sequentially move through the verbs listed within the <Response> element, executing each in the specified order. Call control will move to the first action listed, or if it reaches the end, then it will disconnect the call since the <Response> is finished.

Here is an early exit example. The third <say> won't execute because the call flow will be transferred to "next.php" specified by the action at the second <Say>.

 <Response>  
     <Say>This is Say XML tag 1 encapsulated in response tag</Say>  
     <Say action='next.php'>This is Say XML tag 2 with action attribute encapsulated in response tag</Say>  
     <Say>Call control has been transferred to "next.php" so this line will never execute</Say>  
 </Response>

Echo

The <Echo> verb allows for real-time audio feedback, enabling immediate playback of user input to facilitate testing. For instance, it might be used to immediately return received audio or input bamustck to the sender. This could be used for testing or feedback purposes within a voice application. If a user says something, the system could use "Echo" to repeat the user's words. If the timeout is not configured, then the "echo" will continue until the user hangs up.

 <Echo timeout="5"></Echo>
Echo Attribute Description
timeout For an "Echo" verb, this attribute controls how many seconds the system will keep echoing the media back to the caller. It defaults to 0, which means unlimited, and the caller will have to drop the call. Timeout is configured in seconds ("5" means 5 seconds).

Say

The <Say> verb instructs the system to convert text into speech during a phone call, so that the caller hears the text as spoken words. It uses TTS to convert text to audio sentences.

This verb relies on the API and the Voice Services integration to convert the sentence to audio for playback.

 <Say>Hello, this is a simple text-to-speech message.</Say>
Say Attributes Description
voice This attribute controls the voice used for a "Say" verb. It accepts either a gender (male, female), a voice name, or voice id. Be careful using a voice ID if you also set language. If you choose a voice ID AND language that do not match, then the system will default back to the default voice for your domain.
language BCP-47 language code. Like en-US, pt-BR, fr-CA.

Gather

The <Gather> verb already existed, but it can now also collect speech transcripts via voice (along with pressed digits).

On a technical level, this verb instructs the system to collect DTMF, speech, or both. If DTMF is specified and DTMF is collected, then a request will be sent with the “Digits” param back. If speech is specified, then it is collected through speech_cmd and a request is sent back with the transcript in “SpeechResult”.

 <Gather input='dtmf speech' action='process_action.php’>  
   <Say language=‘en-US’ voice=‘amy’>  
     Hi, how can I help you today ?  
   </Say>  
 </Gather>

<Gather> can contain nested verbs like <Say> or <Play> that can execute while the system is gathering speech or DTMF.

It relies on the API and the Voice Services integration to convert the sentence to audio for playback. We recommend that you use Deepgram as your default vendor to utilize all of the available voice attributes. 

Gather Attributes Description
numDigits Max number of DTMF digits expected to collect.
input Specify input type. Add "speech" and/or "dtmf" to collect an audio transcript and/or a sequence of DTMF digits.
hints Provide hints of possible words, that are expected to be captured in the transcript. This attribute is especially useful for a directory; providing a list of expected last names will increase accuracy.
timeout DTMF input timeout
digitEndTimeout Sets the maximum time interval between successive DTMF digit inputs
language Language code used on speech collection.

Hangup

The <Hangup> verb hangs up the call. You can't nest any verbs within it; it can only be nested within a <Response>.

 <Response>  
     <Hangup/>  
 </Response>
 

Stream

The <Stream> verb bidirectionally streams media in & out via a WebSocket server in near real-time. Developers can both intercept and inject live audio, making it invaluable for applications requiring real-time audio processing, such as AI-driven analysis and interactive voice response systems.

 

Stop a Stream

The application can send a stop event to close the stream through the WebSocket. Upon receiving a "stop" event, the <Stream> will stop transmitting. If it is a Sync stream, then it will exit and proceed to the next Web Responder action.

{
    "event": "stop"
}

📘

There are two ways to start a Stream: asynchronous (listen only) & synchronous bi-directional (transmit & listen).

Asynchronous Stream

When used within a "Start" verb, "Stream" enables asynchronous audio streaming. It's NOT bidirectional; it's only in one direction. This allows the call to proceed while audio is being transmitted. Asynchronous audio is streamed to the WebSocket URL.

In the following example, audio is streamed to wss://example.com/audio while the call continues with the next instruction, preventing call interruption. If no further instructions are provided, the call may be disconnected; therefore, it is advisable to include subsequent Web differentResponder instructions to continue the call session.

<Response>
    <Start>
        <Stream name="my_stream" url="wss://example.com/audio" />
    </Start>
    <Say>This will execute after connecting the stream</Say>
</Response>
 

Stopping an Asynchronous Stream

If a unique name was used to start the asynchronous stream, then the stream can be stopped at any time by providing its name within a "Stop" verb, like so:

<Response>
    <Stop>
        <Stream name="my_stream"/>
    </Stop>
</Response>

Synchronous Bi-directional Streaming

For applications requiring synchronous bi-directional streaming where you might need to send audio back to the call or interact with the caller based on the streamed audio, you should use the "Stream" verb within the "Connect" verb. This facilitates a two-way audio stream between the call and your application, enabling interactive scenarios.

<Connect>
    <Stream url="wss://example.com/audio"></Stream>
</Connect>
Stream Attributes Description
url The WebSocket URL where the audio stream will be sent. Ensure this URL uses a secure (wss) protocol and can handle WebSocket connections.
track This specifies which audio track to stream. There are 3 possible values: inbound (incoming audio), outbound (outgoing audio), and both (both tracks).

-inbound: this value indicates that only the audio coming into the Core Module. So from the external party (e.g., the caller's voice) is included in the stream. Use this setting if you are only interested in analyzing or processing what the caller says.
-outbound: this value indicates that only the audio being sent from the Core Module to the external party (e.g., the agent's voice) is included in the stream. Select this option if your focus is on what the agent or automated system says to the caller.
- both: this setting streams both inbound and outbound audio tracks. It is useful for full conversation analysis, such as enabling applications to process and analyze the entire dialog between the caller and the agent or automated system.
name An optional identifier for the stream, useful for distinguishing between multiple streams in your application.

Custom Parameters

Custom parameters can be added to the "Stream" verb to pass additional information to the WebSocket server. These parameters are sent with the "start" WebSocket event and can be used to provide context or session-specific data.

In this example, customerID and callType are passed along with the audio stream, allowing the server to tailor the processing based on these parameters. These parameters allow for the transmission of contextual information or session-specific data to the receiving application, enhancing the call's processing and analysis.`

<Stream name="word_detection" url="wss://example.com/audio" track="both">
    <Parameter name="customerID" value="12345" />
    <Parameter name="callType" value="support" />
</Stream>`
Parameter Attributes Description
name The parameter's name, acts as the key in the key-value pair sent to the endpoint.
value The parameter's value, acts as the value in the key-value pair.
delete  

📘

Life Cycle of a <Stream> Verb

In the life cycle of a "Stream", several types of events are The parameter's name acts as the key in the key-value pair sent to the endpoint.represented through WebSocket messages: Connected, Start, Media, and Stop. Each message sent over the WebSocket is a JSON string. The event property within each JSON object identifies the type of event occurring.

Connected Message

The Connected message is the first to be sent once a WebSocket connection has been established. Its attributes are:

  • event: a string value "connected" indicating the establishment of the connection
  • version: indicates the semantic version of the protocol
 {  
     "event": "connected",  
     "version": "1.0.0"  
 }
 

Start Message

Sent immediately after the Connected message, the Start message contains essential information about the Stream and is dispatched only once. Its attributes are:

  • event: a string value "start"
  • sequenceNumber: tracks the order of message delivery, starting with "1"
  • start: an object containing Stream metadata such as stream_id, call_id, expected tracks, customParameters, and mediaFormat.
 {  
     "event": "start",  
     "sequenceNumber": "1",  
     "start": {  
         "stream_id": "2d9be7d7-4a66-4667-b2e1-cf3fa4e66306",  
         "call_id": "vmj74pd5nmm9jlukvbfg",  
         "tracks": [  
             "inbound",  
             "outbound"  
         ],  
         "customParameters": {  
             "client_id": "123123123",  
             "status": "1"  
         },  
         "mediaFormat": {  
             "encoding": "audio/x-mulaw",  
             "sampleRate": 8000,  
             "channels": 1  
         }  
     }  
 }

Media Message

A Media message encapsulates the raw audio data within the stream. Its attributes are:

event: a string value "media"

sequenceNumber: increments with each new message

media: contains the audio payload and additional metadata like track, chunk and timestamp

chunk: this attribute signifies the sequence of the audio packets being sent on that track. The first message will start with chunk "1" and increment with each subsequent message. This sequential numbering helps ensure that audio frames are processed in the correct order, maintaining the integrity of the conversation.

timestamp: this represents the presentation timestamp of each audio chunk relative to the start of the stream, in milliseconds.

 {  
     "event": "media",  
     "sequenceNumber": "3",  
     "media": {  
         "track": "outbound",  
         "chunk": "1",  
         "payload": "base64AudioData",  
         "timestamp": "1708388604"  
     }  
 }

Stop Message

A Stop message indicates the Stream's termination or the call's end. Its attributes are:

event: a string value "stop"

sequenceNumber: increments with each new message

stop: contains metadata about the Stream session

bytesSent: this indicates the total amount of data transmitted during the stream for each enabled track.
duration: This reflects the total duration of the audio stream for each track, typically measured in seconds.

 {  
     "event": "stop",  
     "sequenceNumber": "416",  
     "stop": {  
         "tracks": [  
             {  
                 "type": "inbound",  
                 "bytesSent": 66240,  
                 "duration": 8  
             },  
             {  
                 "type": "outbound",  
                 "bytesSent": 99114,  
                 "duration": 10  
             }  
         ],  
         "call_id": "24j83iflgottdokd39vt",  
         "stream_id": "6ccf2088-6755-4861-a162-3c6c30d6ad07"  
     }  
 }

Wait

The <Wait> verb can be used to include configurable pauses within a call flow. It instructs the system to pause for a specified duration before proceeding with the next instruction in the sequence. This can be particularly useful for creating delays within voice interactions, such as spacing out verbal prompts to make them easier for the user to understand. If a wait timeout is not configured, then the system will default to waiting 1 second.

<Wait timeout="5"></Wait>
Wait Attribute Description
timeout For a "Wait" verb, this attribute controls for how many seconds the Core Module will wait before executing the next step. It defaults to 1, which means it will wait 1 second.

Example Use Cases

Verb(s) Instructions
echo (without a timeout) <Echo/>
echo (with a timeout) <Echo timeout=”15”></Echo>
wait (without a timeout) <Wait/>
wait (with a timeout) <Wait timeout=”5”></Wait>
wait (with a timeout) within a response <Response>
<Say>Hold on, we are connecting your call.</Say>
<Wait timeout="5"/>
<Say>Thank you for waiting.</Say>
</Response>
echo & wait together within a response <Response>
<Say> Wait for 5 seconds and test your microphone and speaker by talking to yourself after the tone</Say>
<Wait timeout='5'></Wait>
<Echo timeout='15'></Echo>
</Response>
say <Say>Hello, this is a simple text-to-speech message.</Say>
gather speech & say in the appropriate language <Gather input='dtmf speech' action='process_action.php’>
<Say language=‘en-US’ voice=‘amy’>
Hi, how can I help you today ?
</Say>
</Gather>
gather DTMF digits and send to “ivr.php” <Gather numDigits="1" action="ivr.php"></Gather>
gather 10 digits and send to “account.php”, play an audio that requests the account number <Gather input="dtmf" numDigits="10" action="account.php">
<Play>https://myhost.com/enter-your-account-number.wav</Play>
</Gather>
gather 1 digit DTMF and use TTS <Gather numDigits="1" action="next.php">
<Say voice="female">Enter one to continue</Say>
</Gather>
play greeting, gather 1 digit DTMF, and say audio <Response>
<Play>Welcome to CompanyName</Play>
<Gather numDigits="1" action="ivr.php">
<Say voice="female">Enter 1 or say sales to connect to sales, enter 2 or say support to connect to support</Say>
</Gather>
</Response>
gather speech, say a difference language <Gather input="speech" hints="Thiago Vicente" language="pt-BR" action="ivr.php">
<Say voice="male" language="pt-BR">Olá, com quem você gostaria de falar?</Say>
</Gather>
gather speech, use a custom format, say audio <Gather input="speech" model="nova-2-general" numDigits="1" action="http://myhost.com/app/stashvmail.php">
<Say voice="female">Hello, thanks for calling, leave a message now</Say>
</Gather>
gather speech, disable smartFormat, enable numerals <Gather input="speech" smartFomart="no" numerals="yes">
<Say voice="female">Hello, Say some numbers now</Say>
</Gather>
say audio in the default language and voice <Response>
<Say>This will play using the default voice and language associated with the Web Responder system user/domain</Say>
</Response>
say audio in a male or female voice <Response>
<Say voice="male">Hello there! How are you?</Say>
<Say voice="female">Hi! Nice to meet you</Say>
</Response>
say audio in a specified language <Response>
<Say language="en-AU">G'day! How's it going?</Say>
</Response>
say audio in a specified language and gendered voice <Response>
<Say voice="male" language="es-ES">¡Hola! ¿Cómo estás?</Say>
</Response>
say audio in a specified language and voice ID <Response>
<Say voice="pt-BR-Wavenet-C" language="pt-BR">Olá! Como você está?</Say>
</Response>
asynchronous audio stream <Response>
<Start>
<Stream url="wss://example.com/audio">
</Stream>
</Start>
<Say>This will execute after connecting the stream</Say>
</Response>
stop a stream <Response>
<Stop>
<Stream name="mystream" />
</Stop>
</Response>
Synchronous Bi-directional Streaming <Connect>
<Stream url="wss://example.com/audio">
</Stream>
</Connect>

Web Responders in a Call Trace

Starting in v44, Web Responders will now show as a call participant in the call trace.

Here is an example testing scenario:

  1. Contact Support to create a dial translation for the Web Responder using the application "To Web".
  2. Use the "destination" of the above dial translation and set it as a user's forwarding rule.
  3. Call the user. The following trace should show the Web Responder as the call participant ("POST").

Web Responder Enhanced Security

There is a new system property, called WebRespSecret, which defines a global secret key used for signing all Web Responder requests. This property is not set by default, meaning no signature is appended to the request.

There is also a new token, HttpSecret, which facilitates the customization of the signing secret on a per-request basis via the dial-rule parameter. This allows for flexibility in scenarios where different Web Responders need different secrets.


 

Was this article helpful?

Print to PDF