Épisodes

  • From Chaos to Reliability with Gremlin CEO Kolton Andrus
    Jul 1 2025

    In this episode, Kolton Andrus, Founder and CEO of Gremlin deep dives into all things chaos engineering and reliability testing. Kolton shares his journey from leading reliability efforts at Amazon and Netflix to founding Gremlin, an enterprise reliability platform. They discuss what it really takes to build resilient systems, the cultural shift required to prioritize reliability, and how Gremlin is working to reshape accountability in engineering teams. From testing dependencies to aligning incentives, this conversation is packed with real-world insights into scaling systems (and teams) that don't break under pressure.

    Episode page

    ---

    Kolton Andrus is the CEO and founder of Gremlin. Prior, he focused on building and operating reliable systems at Netflix and Amazon. At both companies he operated systems at scale, managed company wide incidents and helped build out their respective reliability programs and toolsets.

    Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

    • (00:00) - Intro & Guest: Kolton Andrus
    • (04:20) - Founding Gremlin (2016)
    • (08:47) - Rewarding Invisible Reliability Work
    • (12:27) - Proving Reliability’s Business Value
    • (15:21) - Rethinking the “Chaos Engineering” Label
    • (20:18) - Chaos Testing to Reliability Scores
    • (24:25) - Spreading Reliability Culture Across Teams
    • (28:50) - Safe, Incremental Failure Testing in Prod
    • (33:30) - Load + Fault Testing for Peak Traffic
    • (36:30) - AI’s Opportunities & Risks for Ops
    • (39:30) - Defining Scalability as Elasticity
    • (44:18) - Key Takeaways & Farewell

    © Queue-it, 2025
    Voir plus Voir moins
    45 min
  • The Cost of Scaling for Peak Demand with Head of Engineering Martin Jensen
    Jun 17 2025

    In this episode, Martin Jensen, Head of Engineering, breaks down the true cost of scaling for peak demand. He explains the limits of autoscaling, when pre-scaling makes sense, and how tools like virtual waiting rooms are used to handle sudden spikes in traffic. Martin also discusses system bottlenecks, performance trade-offs, and practical strategies for staying in control during high-demand moments like ticket sales, product drops, and popular registrations.

    Episode page

    ---

    This episode´s guest is Martin Jensen. Martin Nørskov Jensen is an experienced engineering leader and Head of Engineering at Queue-it. With 15+ years in software development and 5+ years in leadership, he builds agile, high-performing teams focused on collaboration, trust, and engineering excellence.

    Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

    • (00:00) - Intro
    • (00:58) - Meet Guest Martin Jensen
    • (02:10) - What exactly *is* peak demand?
    • (03:20) - Real-world peak-traffic examples
    • (05:39) - Auto- vs pre-scaling strategies
    • (07:09) - Scaling limits & hidden costs
    • (10:11) - Virtual waiting rooms explained
    • (13:33) - How queues + scaling fit together
    • (18:45) - CDNs, caches & other toolkits
    • (26:08) - Key take-aways & pro tips
    • (29:32) - Outro

    © Queue-it, 2025


    Voir plus Voir moins
    30 min
  • Running High-Traffic Product Drops at Rapha with Tristan Watson
    Jun 3 2025

    In this episode, seasoned platform engineer Tristan Watson shares his learnings from handling peak traffic at Rapha and Booking.com. Tristan reveals the key challenges, trade-offs, and best practices involved in preparing infrastructure for high-traffic product drops and collaborations. Whether you're navigating traffic surges or optimizing for resilience, Tristan’s advice will help you prepare your systems to handle the pressure.

    Episode page

    ---

    This episode´s guest is Tristan Watson. Tristan Watson has spent over a decade mastering the art of keeping websites fast, stable, and scalable. With experience leading teams and steering key projects across tech, retail, and finance he consistently balances technical excellence with business goals. His pragmatic approach and passion for emerging tech like AI make him a sought-after consultant. Off the clock, you’ll find him exploring new tech trends or out on a bike ride. You can find Tristan on LinkedIn here.

    Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

    • (00:00) - Intro
    • (00:58) - Tristan's journey
    • (02:47) - Differences in scalability
    • (08:08) - Differences in traffic peaks
    • (11:12) - The challenges of an SRE team
    • (16:34) - High stakes make the most memorable moments
    • (19:39) - The Rapha system setup in more detail
    • (26:24) - Iterating - anticipating problems or learning from mistakes
    • (27:57) - The alternatives on the table
    • (29:30) - Uncertainty in the reliability of the current systems
    • (30:59) - The virtual waiting room
    • (33:13) - Experience during the drop
    • (37:03) - The best moments are with great partners
    • (40:00) - Main learnings from Product drop
    • (42:04) - Rapid Fire Questions
    • (46:07) - Outro

    © Queue-it, 2025


    Voir plus Voir moins
    47 min
  • Designing Features For Scale with Head of Product & UX Karen Risvig
    May 20 2025

    What does it take to build product features that hold up under massive traffic while still delivering a great user experience? Karen Risvig, Head of Product & UX at Queue-it, joins Smooth Scaling to share how her team designs for scalability, resilience, and security from day one. From invite-only waiting rooms to real-time visitor analytics, Karen breaks down the product decisions behind some of Queue-it’s most exciting features—and the trade-offs that come with building for scale.

    Episode page

    ---

    This episode's guest is Karen Risvig. Karen Risvig is a dynamic product leader with extensive experience in the SaaS industry. As Head of Product & UX at Queue-it, she leads product strategy, roadmap execution, and a team of product managers and UX specialists. She brings a data-driven approach to decision-making and excels at turning complex challenges into clear, high-impact solutions. Karen is known for fostering cross-functional collaboration and delivering products that align with both user needs and business goals.

    Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

    • (00:00) - Intro
    • (01:05) - Career Journey
    • (03:45) - Projects vs. Products: Shifting Perspectives
    • (05:25) - Feature Spotlight: Invite-Only Waiting Rooms
    • (08:49) - Leveraging Customer Insights and Behavioral Data
    • (10:21) - Capturing User Attention
    • (11:48) - Reducing Purchase Friction
    • (13:17) - Early-stage Feature Evaluation and Idea Validation
    • (17:50) - Balancing Problems and Opportunities
    • (19:27) - Core Requirements: Scalability, Resilience, and Security
    • (25:14) - Scalability in Practice: Handling Massive User Interaction
    • (27:29) - Trade-offs in Product Development
    • (29:33) - Lessons Learned at Queue-it
    • (31:52) - Rapid-fire Q&A: Resources and Advice
    • (33:55) - Outro

    © Queue-it, 2025
    Voir plus Voir moins
    35 min
  • Load Testing for Peak Traffic with Radview's Yam Shal-Bar
    May 5 2025

    In this episode, Yam Shal-Bar, CTO at Radview, discusses the the evolving world of load testing and how it's used to prepare for peak traffic. He covers the most common system bottlenecks, the importance of iterative testing, and strategies for accurately simulating user journeys. Yam shares insights into common misconceptions around testing, best practices, and trends like AI for test analysis and API-level testing. Whether you're launching a new web app or tuning an existing one, this episode is packed with practical advice for testing systems for resilience and scalability.

    Episode page

    ---

    This episode's guest is Yam Shal-Bar. Yam Shal-Bar is an experienced development leader and software architect with over two decades of expertise in managing distributed teams and delivering enterprise-scale software solutions. As CTO at RadView Software, he drives the company’s technical roadmap and leads the development of core products, including RadView’s performance testing platform and web dashboard. Throughout his career—including leadership roles at British Telecom, Reliance Infocomm, and Vodafone—he has championed Agile methodologies, DevOps practices, and CI/CD pipelines to deliver robust, scalable systems.

    Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

    • (00:00) - Intro
    • (00:59) - Yam Shal-Bar and Radview
    • (02:20) - Simulating User Journeys
    • (05:00) - How to approach load testing?
    • (06:25) - What are common finds when load testing?
    • (10:42) - Different perspectives on "capacity"
    • (16:05) - The playbook of load testing
    • (18:50) - What are common bottlenecks in complex systems?
    • (20:32) - Third-party services and load testing
    • (23:33) - What exactly are you load testing?
    • (26:11) - What is changing within the load testing space?
    • (27:39) - API and User Journey testing
    • (30:56) - Rapid-Fire Questions
    • (35:25) - Wrap Up

    © Queue-it, 2025
    Voir plus Voir moins
    36 min
  • Simple is Scalable with Product Architect Mojtaba Saroonghi
    Apr 22 2025

    What makes a system scalable? In this episode, Mojtaba Saroonghi explains why simplicity is the secret to scalability. Saroonghi explains why avoiding complexity helps minimize the risk of failure while improving troubleshooting, deployment, and the overall scalability of a system. He walks though how Queue-it has maintained simplicity as it has grown, the allure of complexity, and how architects can incorporate simplicity into their system design and development.

    Episode page

    ---

    This episode's guest is Mojtaba Saroonghi, a Distinguished Product Architect at Queue-it. Moji was one of the company’s first employees, starting his journey as a software developer over 10 years ago. He is highly experienced with AWS services, product and architectural design, managing developer teams, and defining and executing on product vision.

    Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

    • (00:00) - Welcome & Introduction
    • (01:08) - Moji’s Background
    • (01:59) - What Makes a System Scalable?
    • (05:19) - Trade-offs of Simplicity vs. Complexity
    • (11:37) - Simplicity in scalability
    • (13:00) - Simplicity and complexity in Queue-it
    • (17:32) - Everyday Complexity in Engineering
    • (19:22) - Quickfire Round
    • (23:40) - Wrap-up

    © Queue-it, 2025
    Voir plus Voir moins
    24 min
  • Design for Failure with Product Architect Martin Larsen
    Apr 22 2025

    No system has 100% reliability. Failures and faults are inevitable. At scale, everything breaks. In this episode, Martin Larsen explains the design for failure approach behind Queue-it’s architecture and how it increases the platform’s availability and resilience. Larsen explores the principles behind designing for failure, the tradeoffs involved, the mechanisms implemented at Queue-it, and the tangible ways companies can bring this development approach into their processes.

    Episode page

    ---
    Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

    This episode's guest is Martin Larsen, a Distinguished Product Architect at Queue-it. Starting as a software developer, Martin was one of the company’s first employees. He played an instrumental role in building the foundations of Queue-it and is heavily involved in activities including the design, architecture, testing, and deployment of the virtual waiting room, as well as defining and executing on product vision.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

    © Queue-it, 2025

    Voir plus Voir moins
    28 min