Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)
#1about 3 minutes
The hidden costs of large LLM context windows
Large context windows in models like GPT-5 seem to eliminate the need for RAG, but the high token cost makes this approach expensive and unscalable for every request.
#2about 3 minutes
A brief introduction to vectors and vector search
Text is converted into numerical vector embeddings that capture its semantic meaning, allowing computers to efficiently calculate the similarity between different phrases or documents.
#3about 9 minutes
How to classify text using a vector database
Instead of using a costly LLM for every classification task, you can use a vector database to match new text against pre-embedded reference examples for a specific label.
#4about 5 minutes
Using semantic routing for efficient tool calling
By matching user prompts against pre-defined reference phrases for each tool, you can directly trigger the correct function without an initial, expensive LLM call.
#5about 5 minutes
Reducing latency and cost with semantic caching
Semantic caching stores LLM responses and serves them for new, semantically similar prompts, which avoids re-computation and significantly reduces both cost and latency.
#6about 7 minutes
Strategies for optimizing vector search accuracy
Improve the accuracy of vector search patterns through techniques like self-improvement, a hybrid approach that falls back to an LLM, and chunking complex prompts into smaller clauses.
#7about 3 minutes
Addressing advanced challenges in semantic caching
Mitigate common caching pitfalls, like misinterpreting negative prompts, by using specialized embedding models and combining semantic routing with caching to avoid caching certain types of queries.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
05:33 MIN
Reducing latency and cost with semantic caching
WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more
01:48 MIN
Solving LLM limitations with RAG and vector databases
Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps
08:44 MIN
Using semantic classification to categorize text
WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more
04:30 MIN
Advanced patterns for building sophisticated AI applications
Java Meets AI: Empowering Spring Developers to Build Intelligent Apps
05:09 MIN
Implementing semantic routing for tool calling and guardrails
WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more
03:05 MIN
Exploring the architecture of a RAG system
Building Real-Time AI/ML Agents with Distributed Data using Apache Cassandra and Astra DB
03:45 MIN
Comparing LLM, vector search, and graph RAG approaches
Give Your LLMs a Left Brain
03:21 MIN
Using caching to serve pre-generated AI responses
Performant Architecture for a Fast Gen AI User Experience
Featured Partners
Related Videos
WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more
Chris Heilmann, Daniel Cranney, Raphael De Lio & Developer Advocate at Redis
Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps
Dieter Flick & Michel de Ru
Carl Lapierre - Exploring Advanced Patterns in Retrieval-Augmented Generation
Carl Lapierre
Semantic AI: Why Embeddings Might Matter More Than LLMs
Christian Weyer
Building Real-Time AI/ML Agents with Distributed Data using Apache Cassandra and Astra DB
Dieter Flick
Martin O'Hanlon - Make LLMs make sense with GraphRAG
Martin O'Hanlon
Three years of putting LLMs into Software - Lessons learned
Simon A.T. Jiménez
How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge
Meta Atamel & Guillaume Laforge
Related Articles
View all articles


.gif?w=240&auto=compress,format)
From learning to earning
Jobs that call for the skills explored in this talk.

AUTO1 Group SE
Berlin, Germany
Intermediate
Senior
ELK
Terraform
Elasticsearch


Speech Processing Solutions
Vienna, Austria
Intermediate
CSS
HTML
JavaScript
TypeScript



webLyzard
Vienna, Austria
DevOps
Docker
PostgreSQL
Kubernetes
Elasticsearch
+2


Understanding Recruitment Group
Barcelona, Spain
Remote
Node.js
Computer Vision
Machine Learning
