Voice AI Agent Analytics

Context-Aware Semantic Cache

We’re Adrian Cowham and Tom Shapland, the cofounders of Canonical AI. We’ve worked together for over ten years. Our last company was an agtech company called Tule. Tom was the CEO and Adrian was the CTO. Now, we’re building a caching layer to reduce Large Language Model (LLM) latency and cost.

Our First Rodeo

How did we get to semantic caching? Let’s start with our last company. Tule, a Y Combinator-backed company from the Summer 2014 batch, helped farmers make irrigation decisions. We had a proprietary sensor that Tom developed in his PhD for measuring the water use of a field (technically, the latent heat flux density, AKA evapotranspiration). The data set was rich, proprietary, and described the fundamental workings of an agroecosystem. In other words, the data was the perfect canvas for machine learning.

We would work together to prototype new machine learning-based products. Once we had worked out the model, Adrian would architect and build the software, and design the user experience to turn the model into something farmers loved, like our augmented reality plant water stress product. Then Tom and the sales and marketing team would sell the product to farmers.

We learned a lot about the fascinating, non-deterministic world of machine learning and AI from our work at Tule. After Tule was acquired by CropX, we wanted to go on the journey again.

From RAGs To Caches

When we started Canonical AI, we set out to build Retrieval Augmented Generation (RAG) products for specific verticals, starting with reference content publishers. In the process of building RAG products, Adrian built a semantic cache to address the problems of LLM cost, LLM latency, and LLM rate limiting.

An LLM cache reminds us of the technology from Tule. Like our ecosystem-scale turbulent flux measurement technology from Tule, caching is a classically difficult domain with an immense opportunity for creativity and building value. This is especially true of conversational AI caching. We found ourselves drawn to the challenge and opportunity in building a caching layer for AI. We stopped working on RAG and went all in on caching.

Nobody Has A Crystal Ball For Generative AI

The world of LLMs is evolving quickly. Although it appears that some people have this Generative AI thing all figured out, when you ask around, you realize no one knows what’s going to work and where this is going. All of us engineers, developers, builders, and founders are on an uncertain journey together. If we knew where it was all going, it would be boring. We’re looking forward to working with you as we all figure it out together.

Guess and Check. Works Every Time.

One last thing. As you might imagine, after years of friendship and working together, we’ve developed quite the repertoire of recurring, inside jokes. For example, when Adrian asks Tom a question, and Tom answers, Adrian’s standard response is, “Knew it.” And when Adrian asks Tom how he did something, his favorite answer is “Guess and check. Works every time.”

Did you learn what you were hoping to learn when you visited this page? Yes?

Knew it.

Tom and Adrian
April 2024