Semantic Caches: Scaling AI with Smarter Caching (Chapter 15)

Échec de l'ajout au panier.

Veuillez réessayer plus tard

Échec de l'ajout à la liste d'envies.

Veuillez réessayer plus tard

Échec de la suppression de la liste d’envies.

Veuillez réessayer plus tard

Échec du suivi du balado

Ne plus suivre le balado a échoué

Semantic Caches: Scaling AI with Smarter Caching (Chapter 15)

Écouter gratuitement

Voir les détails du balado

À propos de cet audio

emantic caches are transforming how AI systems handle costly reasoning by intelligently reusing prior agent workflows to slash latency and inference costs. In this episode, we unpack Chapter 15 of Keith Bourne’s "Unlocking Data with Generative AI and RAG," exploring the architectures, trade-offs, and practical engineering of semantic caches for production AI.

In this episode:

- What semantic caches are and why they reduce AI inference latency by up to 100x

- Core techniques: vector embeddings, entity masking, and CrossEncoder verification

- Comparing semantic cache variants and fallback strategies for robust performance

- Under-the-hood implementation details using ChromaDB, sentence-transformers, and CrossEncoder

- Real-world use cases across finance, customer support, and enterprise AI assistants

- Key challenges: tuning thresholds, cache eviction, and maintaining precision in production

Key tools and technologies mentioned:

- ChromaDB vector database

- Sentence-transformers embedding models (e.g., all-mpnet-base-v2)

- CrossEncoder models for verification

- Regex-based entity masking

- Adaptive similarity thresholding

Timestamps:

00:00 - Introduction and episode overview

02:30 - What are semantic caches and why now?

06:15 - Core architecture: embedding, masking, and verification

10:00 - Semantic cache variants and fallback approaches

13:30 - Implementation walkthrough using Python and ChromaDB

16:00 - Real-world applications and performance metrics

18:30 - Open problems and engineering challenges

19:30 - Final thoughts and book spotlight

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Memriq AI: https://Memriq.ai

Pas encore de commentaire