Automated IVR Navigation with Semantic Caching

Semantic Caching for IVR Systems

Does your application have to complete an Interactive Voice Response (IVR) at the beginning of every call? You can use semantic caching to complete the IVR. It’s faster and cheaper than a Large Language Model (LLM).

In healthcare, the phone still reigns supreme. It’s not just patients who must first pass the IVR gatekeeper. Healthcare companies also have to navigate an IVR when calling other healthcare companies for matters like referral coordination, insurance verification and claims, medical record requests, and so on.

We work with a lot of Voice AI companies and we’re seeing more and more Voice AI applications in healthcare. Many healthcare Voice AI applications have to navigate an IVR. And modern Voice AI companies are using LLMs for the entire call, including the IVR. Using an LLM to navigate an IVR is like getting an x-ray to check that you have five fingers. It’ll work, but there are simpler means.

It turns out you can use semantic caching to navigate an IVR for tasks like insurance verification, insurance authorization, et cetera.

A Guide to Automated IVR Navigation

When we think of semantic caching for LLM applications, we think of the human submitting the query (i.e., the cache key) and the cached responses (i.e., cache value) originating from the LLM. For automating IVR workflows, it’s the opposite. The IVR submits the query to the semantic cache and the cached responses are the data, which originated from the human user.

Simple Automated IVR Navigation Example

Templating is a key ingredient. But before I introduce the templating concept, let’s look at a simple example.


'role': 'user', 'content': 'Hi, I am Ken'

'role': 'assistant', 'content': 'Please say or enter the 10 digit national provider number or nine digit tax ID.'
 
'role': 'user', 'content': 'One two three four five six seven eight nine'

The cache key is the assistant query. That is, we search the cache for the assistant query, “Please say or enter the 10 digit national provider number or nine digit tax ID.".

The corresponding cache value is the national provider number. Again, this doesn’t work without templating, but it demonstrates the idea. We use semantic search to recognize the IVR’s request. Then, once we recognize the request, we can quickly and cheaply return a value (i.e., "One two three four five six seven eight nine") from the cache. No LLM needed!

The IVR always says the same thing at each stage, so why do we need semantic caching? Can’t we use exact string key-value caching? No, because audio is messy. Audio quality issues and speech-to-text transcription errors make exact string key-value caching an unreliable solution. With semantic caching, you don’t need an exact match. Instead, the cache recognizes the intent, even if a few of the words are garbled.

Semantic Cache Named Entity Templating

When a user query or LLM response has a named entity (i.e., a person’s name, an identification number), the Canonical AI semantic cache replaces the named entity with a template. In the user response in the previous example, the billing provider tax id “One two three four five six seven eight nine” is replaced with the template {{billing provider tax id}}.

In other words, for the cache key “Please say or enter the 10 digit national provider number or nine digit tax ID”, the cache value is {{billing provider tax id}}. Like this:

Cache key: 'Please say or enter the 10 digit national provider number or nine digit tax ID'

Cache value: {{billing provider tax id}}

When the IVR queries the cache (i.e., ‘Please say or enter…’), and a key match is found in the semantic cache, then the semantic cache returns the cached value (i.e., {{billing provider tax id}}).

In the next step, we substitute in the appropriate value for the template so the cache value becomes "One two three four five six seven eight nine".

Semantic Cache Template Value Substitution

Where do we get the appropriate value to substitute into the cached template?

Let’s say the Voice AI is calling on behalf of a clinic. The goal of the Voice AI is to share the clinic’s billing provider tax id with the IVR. We put a dictionary of the billing provider tax id (i.e., "billing_provider_dictionary = {'billing_provider_tax_id': '123456789'}) as a JSON object in the Voice AI’s system prompt. The dictionary tells our code what values to put into the cache template for {{billing provider tax id}}.

Let’s say you have ten clinics. For each clinic, you want the Voice AI to call a number and tell the IVR the billing provider tax id. Each Voice AI would have the same system prompt except for the JSON object. For each call, the developer updates the JSON object with the clinic’s billing provider tax id.

Update the system prompt with the JSON object, call the IVR, navigate the IVR with cached responses, repeat.

Start Navigating IVRs Programmatically

A response from a semantic cache is cheaper than calling an LLM. Moreover, responding from a semantic cache is much faster than an LLM. For API calls, our response time is about 200 milliseconds. When you self-host our cache, the response time is about 50 milliseconds. If you’re using an end-to-end Voice AI provider who charges by the minute, you’ll save on provider costs as well.

If you would like to try it out, you can generate an API key on our homepage for a free two-week trial.

If you would like our help getting set up, reach out! We’ll set up a slack to help you get started. We’d love to meet you!

Tom and Adrian
June 2024