Épisodes

  • Agent Engineering Unpacked: Breakthrough Discipline or Rebranded Hype?
    Dec 13 2025

    Agent engineering is rapidly emerging as a pivotal discipline in AI development, promising autonomous LLM-powered systems that can perceive, reason, and act in complex, real-world environments. But is this truly a new engineering frontier or just a rebranding of existing ideas? In this episode, we dissect the technology, tooling, real-world deployments, and the hard truths behind the hype.

    In this episode:

    - Explore the origins and "why now" of agent engineering, including key advances like OpenAI's function calling and expanded context windows

    - Break down core architectural patterns combining retrieval, tool use, and memory for reliable agent behavior

    - Compare leading frameworks and SDKs like LangChain, LangGraph, AutoGen, Anthropic Claude, and OpenAI Agents

    - Dive into production case studies from Klarna, Decagon, and TELUS showing impact and ROI

    - Discuss the critical challenges around reliability, security, evaluation, and cost optimization

    - Debate agent engineering vs. traditional ML pipelines and best practices for building scalable, observable agents

    Key tools & technologies mentioned: LangChain, LangGraph, AutoGen, Anthropic Claude SDK, OpenAI Agents SDK, Pinecone, Weaviate, Chroma, FAISS, LangSmith, Arize Phoenix, DeepEval, Giskard

    Timestamps:

    00:00 - Introduction & episode overview

    02:20 - The hype vs. reality: failure rates and market investments

    05:15 - Why agent engineering matters now: tech enablers & economics

    08:30 - Architecture essentials: retrieval, tool use, memory

    11:45 - Tooling head-to-head: LangChain, LangGraph, AutoGen & SDKs

    15:00 - Under the hood: example agent workflow and orchestration

    17:45 - Real-world impact & production case studies

    20:30 - Challenges & skepticism: reliability, security, cost

    23:00 - Agent engineering vs. traditional ML pipelines debate

    26:00 - Toolbox recommendations & engineering best practices

    28:30 - Closing thoughts & final takeaways

    Resources:

    - "Unlocking Data with Generative AI and RAG" second edition by Keith Bourne - Search for 'Keith Bourne' on Amazon

    - Memriq AI: https://memriq.ai

    Thanks for tuning into Memriq Inference Digest - Engineering Edition. Stay curious and keep building!

    Voir plus Voir moins
    20 min
  • The NLU Layer Impact: Transitioning from Web Apps to AI Chatbots Deep Dive
    Dec 13 2025

    Discover how the Natural Language Understanding (NLU) layer transforms traditional web apps into intelligent AI chatbots that understand open-ended user input. This episode unpacks the architectural shifts, business implications, and governance challenges leaders face when adopting AI-driven conversational platforms.

    In this episode:

    - Understand the strategic role of the NLU layer as the new ‘brain’ interpreting user intent and orchestrating backend systems dynamically.

    - Explore the shift from deterministic workflows to probabilistic AI chatbots and how hybrid architectures balance flexibility with control.

    - Learn about key AI tools like Large Language Models, Microsoft Azure AI Foundry, OpenAI function-calling, and AI agent frameworks.

    - Discuss governance strategies including confidence thresholds, policy wrappers, and human-in-the-loop controls to maintain trust and compliance.

    - Hear real-world use cases across industries showcasing improved user engagement and ROI from AI chatbot adoption.

    - Review practical leadership advice for monitoring, iterating, and future-proofing AI chatbot architectures.

    Key tools and technologies mentioned:

    - Large Language Models (LLMs)

    - Microsoft Azure AI Foundry

    - OpenAI Function-Calling

    - AI Agent Frameworks like deepset

    - Semantic Cache and Episodic Memory

    - Governance tools: Confidence thresholds, human-in-the-loop

    Timestamps:

    00:00 - Introduction and episode overview

    02:30 - Why the NLU layer matters for leadership

    05:15 - The big architectural shift: deterministic to AI-driven

    08:00 - Comparing traditional web apps vs AI chatbots

    11:00 - Under the hood: how NLU, function-calling, and orchestration work

    14:00 - Business impact and ROI of AI chatbots

    16:30 - Risks, governance, and human oversight

    18:30 - Real-world applications and industry examples

    20:00 - Final takeaways and leadership advice

    Resources:

    - "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

    - Visit Memriq at https://Memriq.ai for more AI insights and resources

    Voir plus Voir moins
    16 min
  • Advanced RAG with Complete Memory Integration (Chapter 19)
    Dec 12 2025

    Unlock the next level of Retrieval-Augmented Generation with full memory integration in AI agents. In the previous 3 episodes, we secretly built up what amounts to a 4-part series on agentic memory. This is the final piece of that 4-part series that pulls it ALL together.

    In this episode, we explore how combining episodic, semantic, and procedural memories via the CoALA architecture and LangMem library transforms static retrieval systems into continuously learning, adaptive AI.

    This also concludes our book series, highlighting ALL of the chapters of the 2nd edition of "Unlocking Data with Generative AI and RAG" by Keith Bourne. If you want to dive even deeper into these topics and even try out extensive code labs, search for 'Keith Bourne' on Amazon and grab the 2nd edition today!

    In this episode:

    - How CoALAAgent unifies multiple memory types for dynamic AI behavior

    - Trade-offs between LangMem’s prompt_memory, gradient, and metaprompt algorithms

    - Architectural patterns for modular and scalable AI agent development

    - Real-world metrics demonstrating continuous procedural strategy learning

    - Challenges around data quality, metric design, and domain agent engineering

    - Practical advice for building safe, adaptive AI agents in production

    Key tools & technologies: CoALAAgent, LangMem library, GPT models, hierarchical memory scopes


    Timestamps:

    0:00 Intro & guest welcome

    3:30 Why integrating episodic, semantic & procedural memory matters

    7:15 The CoALA architecture and hierarchical learning scopes

    10:00 Comparing procedural learning algorithms in LangMem

    13:30 Behind the scenes: memory integration pipeline

    16:00 Real-world impact & procedural strategy success metrics

    18:30 Challenges in deploying memory-integrated RAG systems

    20:00 Practical engineering tips & closing thoughts


    Resources:

    - "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

    - Memriq AI: https://memriq.ai

    Voir plus Voir moins
    17 min
  • Procedural Memory for RAG: Deep Dive with LangMem (Chapter 18)
    Dec 12 2025

    Unlock the power of procedural memory to transform your Retrieval-Augmented Generation (RAG) agents into autonomous learners. In this episode, we explore how LangMem leverages hierarchical learning scopes to enable AI agents that continuously adapt and improve from their interactions — cutting down manual tuning and boosting real-world performance.

    In this episode:

    - Why procedural memory is a game changer for RAG systems and the challenges it addresses

    - How LangMem integrates with LangChain and OpenAI GPT-4.1-mini to implement procedural memory

    - The architecture patterns behind hierarchical namespaces and momentum-based feedback loops

    - Trade-offs between traditional RAG and LangMem’s procedural memory approach

    - Real-world applications across finance, healthcare, education, and customer service

    - Practical engineering tips, monitoring best practices, and open problems in procedural memory


    Key tools & technologies mentioned:

    - LangMem

    - LangChain

    - Pydantic

    - OpenAI GPT-4.1-mini


    Timestamps:

    0:00 - Introduction & overview

    2:30 - Why procedural memory matters now

    5:15 - Core concepts & hierarchical learning scopes

    8:45 - LangMem architecture & domain interface

    12:00 - Trade-offs: Traditional RAG vs LangMem

    14:30 - Real-world use cases & impact

    17:00 - Engineering best practices & pitfalls

    19:30 - Open challenges & future outlook


    Resources:

    - "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

    - Memriq AI: https://memriq.ai

    Voir plus Voir moins
    18 min
  • RAG-Based Agentic Memory: Code Perspective (Chapter 17)
    Dec 12 2025

    nlock how Retrieval-Augmented Generation (RAG) enables AI agents to remember, learn, and personalize over time. In this episode, we explore Chapter 17 of Keith Bourne’s "Unlocking Data with Generative AI and RAG," focusing on implementing agentic memory with the CoALA framework. From episodic and semantic memory distinctions to real-world engineering trade-offs, this discussion is packed with practical insights for AI/ML engineers and infrastructure experts.

    In this episode:

    - Understand the difference between episodic and semantic memory and their roles in AI agents

    - Explore how vector databases like ChromaDB power fast, scalable memory retrieval

    - Dive into the architecture and code walkthrough using CoALA, LangChain, LangGraph, and OpenAI APIs

    - Discuss engineering challenges including validation, latency, and system complexity

    - Hear from author Keith Bourne on the foundational importance of agentic memory

    - Review real-world applications and open problems shaping the future of memory-augmented AI

    Key tools and technologies mentioned:

    - CoALA framework

    - LangChain & LangGraph

    - ChromaDB vector database

    - OpenAI API (embeddings and LLMs)

    - python-dotenv

    - Pydantic models


    Timestamps:

    0:00 - Introduction & Episode Overview

    2:30 - The Concept of Agentic Memory: Episodic vs Semantic

    6:00 - Vector Databases and Retrieval-Augmented Generation (RAG)

    9:30 - Coding Agentic Memory: Frameworks and Workflow

    13:00 - Engineering Trade-offs and Validation Challenges

    16:00 - Real-World Applications and Use Cases

    18:30 - Open Problems and Future Directions

    20:00 - Closing Thoughts and Resources


    Resources:

    - "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

    - Visit Memriq AI at https://Memriq.ai for more AI engineering deep dives and resources

    Voir plus Voir moins
    17 min
  • Agentic Memory: Stateful RAG and AI Agents (Chapter 16)
    Dec 12 2025

    Unlock the future of AI agents with agentic memory — a transformative approach that extends Retrieval-Augmented Generation (RAG) by incorporating persistent, evolving memories. In this episode, we explore how stateful intelligence turns stateless LLMs into adaptive, personalized agents capable of learning over time.

    In this episode:

    - Understand the CoALA framework dividing memory into episodic, semantic, procedural, and working types

    - Explore key tools like Mem0, LangMem, Zep, Graphiti, LangChain, and Neo4j for implementing agentic memory

    - Dive into practical architectural patterns, memory curation strategies, and trade-offs for real-world AI systems

    - Hear from Keith Bourne, author of *Unlocking Data with Generative AI and RAG*, sharing insider insights and code lab highlights

    - Discuss latency, accuracy improvements, and engineering challenges in scaling stateful AI agents

    - Review real-world applications across finance, healthcare, education, and customer support


    Key tools & technologies mentioned:

    Mem0, LangMem, Zep, Graphiti, LangChain, Neo4j, Pinecone, Weaviate, Airflow, Temporal


    Timestamps:

    00:00 - Introduction & Episode Overview

    02:15 - What is Agentic Memory and Why It Matters

    06:10 - The CoALA Cognitive Architecture Explained

    09:30 - Comparing Memory Implementations: Mem0, LangMem, Graphiti

    13:00 - Deep Dive: Memory Curation and Background Pipelines

    16:00 - Performance Metrics & Real-World Impact

    18:30 - Challenges & Open Problems in Agentic Memory

    20:00 - Closing Thoughts & Resources


    Resources:

    - "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

    - Visit Memriq.ai for more AI engineering deep dives and resources

    Voir plus Voir moins
    19 min
  • Semantic Caches: Scaling AI with Smarter Caching (Chapter 15)
    Dec 12 2025

    emantic caches are transforming how AI systems handle costly reasoning by intelligently reusing prior agent workflows to slash latency and inference costs. In this episode, we unpack Chapter 15 of Keith Bourne’s "Unlocking Data with Generative AI and RAG," exploring the architectures, trade-offs, and practical engineering of semantic caches for production AI.

    In this episode:

    - What semantic caches are and why they reduce AI inference latency by up to 100x

    - Core techniques: vector embeddings, entity masking, and CrossEncoder verification

    - Comparing semantic cache variants and fallback strategies for robust performance

    - Under-the-hood implementation details using ChromaDB, sentence-transformers, and CrossEncoder

    - Real-world use cases across finance, customer support, and enterprise AI assistants

    - Key challenges: tuning thresholds, cache eviction, and maintaining precision in production


    Key tools and technologies mentioned:

    - ChromaDB vector database

    - Sentence-transformers embedding models (e.g., all-mpnet-base-v2)

    - CrossEncoder models for verification

    - Regex-based entity masking

    - Adaptive similarity thresholding


    Timestamps:

    00:00 - Introduction and episode overview

    02:30 - What are semantic caches and why now?

    06:15 - Core architecture: embedding, masking, and verification

    10:00 - Semantic cache variants and fallback approaches

    13:30 - Implementation walkthrough using Python and ChromaDB

    16:00 - Real-world applications and performance metrics

    18:30 - Open problems and engineering challenges

    19:30 - Final thoughts and book spotlight


    Resources:

    - "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

    - Memriq AI: https://Memriq.ai

    Voir plus Voir moins
    20 min
  • Graph-Based RAG: Hybrid Embeddings & Explainable AI (Chapter 14)
    Dec 12 2025

    Unlock the power of graph-based Retrieval-Augmented Generation (RAG) in this technical deep dive featuring insights from Chapter 14 of Keith Bourne's "Unlocking Data with Generative AI and RAG." Discover how combining knowledge graphs with LLMs using hybrid embeddings and explicit graph traversal can dramatically improve multi-hop reasoning accuracy and explainability.

    In this episode:

    - Explore ontology design and graph ingestion workflows using Protégé, RDF, and Neo4j

    - Understand the advantages of hybrid embeddings over vector-only approaches

    - Learn why Python static dictionaries significantly boost LLM multi-hop reasoning accuracy

    - Discuss architecture trade-offs between ontology-based and cyclical graph RAG systems

    - Review real-world production considerations, scalability challenges, and tooling best practices

    - Hear directly from author Keith Bourne about building explainable and reliable AI pipelines


    Key tools and technologies mentioned:

    - Protégé for ontology creation

    - RDF triples and rdflib for data parsing

    - Neo4j graph database with Cypher queries

    - Sentence-Transformers (all-MiniLM-L6-v2) for embedding generation

    - FAISS for vector similarity search

    - LangChain for orchestration

    - OpenAI chat models

    - python-dotenv for secrets management


    Timestamps:

    00:00 - Introduction & episode overview

    02:30 - Surprising results: Python dicts vs natural language for KG representation

    05:45 - Why graph-based RAG matters now: tech readiness & industry demand

    08:15 - Architecture walkthrough: from ontology to LLM prompt input

    12:00 - Comparing ontology-based vs cyclical graph RAG approaches

    15:00 - Under the hood: building the pipeline step-by-step

    18:30 - Real-world results, scaling challenges, and practical tips

    21:00 - Closing thoughts and next steps


    Resources:

    - "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

    - Visit Memriq AI at https://Memriq.ai for more AI engineering insights and tools

    Voir plus Voir moins
    22 min