Voice AI Agents Are Eating The World
The world of Voice AI is growing fast. Here’s what we're seeing in real-time Voice AI and our take on why it's happening now.
A Powerful New Technology
Voice AI is a broad category. Here I'm writing about interactive Voice AI (i.e., Voice AI Agents, real-time Voice AI), as opposed to asynchronous Voice AI like note takers.
Real-time Voice AI has existed for a while. We’ve all interacted (and mostly hated) Voice AIs built on the old rule-based systems (e.g., Google Dialogflow). As a user, if you don’t interact with the product in exactly the way the designer had thought you would, you can’t get what you need. It’s frustrating.
LLMs have changed the paradigm for Voice AI. It’s primarily for these two reasons.
LLM-based Voice AI agents better understand the intent of the caller and can more often resolve the issue without escalation to a human agent. For example, whenever my 8 year old son wants to know something about baseball, he asks me, 'Can I talk to ChatGPT on your phone? Alexa doesn't understand what I'm asking.'.
With LLM-based Voice AI agents, developers can build a Voice AI agent more quickly, onboard customers quicker, and iterate on the product faster. Developers don't need to write each little rule, but instead they can write the general idea for a behavior in the system prompt.
The emergence of AI agencies speaks to the power and potential of interactive Voice AI. Many of the AI agency owners tell the same origin story. They saw a Voice AI orchestration platform on Product Hunt, called the Voice AI agent, hung up, and quit their day job. They now build things like Voice AI receptionists for SMBs using voice AI orchestration platforms and no/low-code platforms, getting more business than they can handle by posting how-to videos on YouTube.
The Cambrian Explosion of Voice AI Agents
Existing Call Volume
We’re seeing Voice AI consume pre-existing call volume. For example, we’re seeing lead qualification AIs (Infer.so), customer surveys (Domu), and the afterhours agents, appointment setting agents and other agents people are building on Phonely.
Voice AI agents for existing call volume are remarkably effective! It’s so much fun to see the agents achieve the caller’s objective, even in the face of skepticism from the human caller! Here is a common thing I see in Voice AI calls.
LLM-Enabled Voice Applications
We’re also seeing a deluge of new call volume enabled by the Voice AI medium. For example, role playing for professional development (Solid Road) and Boardy, an AI that makes introductions to other professionals in its network.
Voice AI is an artist’s medium. I find myself building this type for fun on Vapi's really easy to use and powerful platform. For example, I built this baseball strategy game to make me feel like a better dad for not knowing anything about baseball.
Voice User Interfaces
Another fascinating category is the Voice User Interface category. Someday my kids will look back and make fun of our current era for how we furiously thumb-tapped out our thoughts on a phone. It’ll seem as bizarre and arcane as a rotary phone or my dad’s 1990s briefcase. I’ve seen some really neat things like using the phone’s camera to read your lips so you can use the voice interface in noisy places without privacy. Another cool example is Whispr Flow, which lets you dictate to any app. I use it all the time.
More generally, I think a Voice AI agent can be thought of as being just like a website. If it’s a SaaS website, the Voice AI agent an interface for users to get something done. If it’s a marketing website, Voice AI agent is an interface for consumers to learn about your offering.
Enterprise Interest Is Driving Voice AI Growth
The hustlers' AI agencies are a force in Voice AI. But it’s the interest from enterprises in Voice AI that’s generating venture backing.
Enterprises are very interested in Voice AI Agents because a) call centers are critical touch-points with their customers for brand maintenance and upsell opportunities, b) call centers have large budgets, and c) staffing and churn and performance are pain points. As a result, startups in Voice AI are getting to enterprise contracts sooner than you’d expect.
But the enterprise interest is mostly untapped. They buy credits upfront and are slowly spending them down. They haven’t unleashed Voice AI on all their call volume, largely, I think due to the unresolved issues in this emerging field, particularly around reliability.
Growing Pains
The world of LLM-based Voice AI agents is still nascent. There’s all sorts of fascinating issues throughout the stack. These include everything from issues deep down in the internet piping to application level issues like accurately capturing a caller’s email address when they spell aloud (the obvious solution is, of course, to retrain humanity on the NATO phonetic alphabet).
One of the biggest challenges holding back larger deployments at enterprises is reliably. It’s not too hard to get a Voice AI working 100% of the time in your test environment or 80% of the time in the wild. But humans are nature’s most unpredictable creation. 80% in the wild isn’t good enough for those enterprise contracts to deploy broadly and rapidly spending down their credits.
Analytics For Voice AI Agents
Like any product, you have to monitor and analyze real user data. To do this, most Voice AI agent developers are manually listening to a subset of their calls. Or they’re only finding out about issues with their agent when their customers complain.
Adrian and I thought there had to be a better way to understand Voice AI agent performance at scale. That's why we’re building the analytics platform for understanding, improving, and reporting on Voice AI agent performance. We map caller journeys. We provide conversational and audio metrics. Eventually, we will build a system that autonomously detects issues and improves the agents.
A Community of Voice AI Builders
The best part of Voice AI is the community. At this point, it feels small enough that it's easy to meet other builders in this space. I’m grateful for all the friends I’ve made here! But it also growing fast and going to get much much bigger. It’s a great place to be. I hope you join us and build an imaginative new Voice-enable AI application!
And if you’re already building a Voice AI agent, please reach out to us! Adrian and I love the Voice AI space! You can also try out our Analytics for Voice AI platform here!
Tom and Adrian
November 2024