Page de couverture de Semi Doped

Semi Doped

Semi Doped

Auteur(s): Vikram Sekar and Austin Lyons
Écouter gratuitement

À propos de cet audio

The business and technology of semiconductors. Alpha for engineers and investors alike.

© 2026 Semi Doped
Épisodes
  • OpenClaw Makes AI Agents and CPUs Get Real
    Feb 3 2026

    Austin and Vik discuss the emerging trend of AI agents, particularly focusing on Claude Code and OpenClaw, and the resulting hardware implications.

    Key Takeaways:

    • 2026 is expected to be a pivotal year for AI agents.
    • The rise of agentic AI is moving beyond marketing to practical applications.
    • Claude Code is being used for more than just coding; it aids in research and organization.
    • Integrating AI with tools like Google Drive enhances productivity.
    • Security concerns arise with giving AI agents access to personal data.
    • Local computing options for AI can reduce costs and increase control.
    • AI agents can automate repetitive tasks, freeing up human time for creative work.
    • The demand for CPUs is increasing due to the needs of AI agents.
    • AI can help summarize and organize information but may lack deep insights.
    • The future of AI will involve balancing automation with human oversight.

    Chapters
    00:00 The Rise of Agents in AI
    03:04 Exploring Claude Code and Its Applications
    05:58 Integrating AI with Google Drive and Email
    08:56 Optimizing Workflows with AI Agents
    12:01 The Future of AI Agents and Local Computing
    14:50 Security and Infrastructure Implications of AI Agents

    Deploy your secure OpenClaw instance with DigitalOcean:
    https://www.digitalocean.com/blog/moltbot-on-digitalocean

    Visit the podcast website: https://www.semidoped.fm
    Austin's Substack: https://www.chipstrat.com/
    Vik's Substack: https://www.viksnewsletter.com/

    Voir plus Voir moins
    48 min
  • An Interview with Microsoft's Saurabh Dighe About Maia 200
    Jan 28 2026

    Maia 100 was a pre-GPT accelerator.
    Maia 200 is explicitly post-GPT for large multimodal inference.

    Saurabh Dighe says if Microsoft were chasing peak performance or trying to span training and inference, Maia would look very different. Higher TDPs. Different tradeoffs. Those paths were pruned early to optimize for one thing: inference price-performance. That focus drives the claim of ~30% better performance per dollar versus the latest hardware in Microsoft’s fleet.

    Intereting topics include:
    • What “30% better price-performance” actually means
    • Who Maia 200 is built for
    • Why Microsoft bet on inference when designing Maia back in 2022/2023
    • Large SRAM + high-capacity HBM
    • Massive scale-up, no scale-out
    • On-die NIC integration

    Maia is a portfolio platform: many internal customers, varied inference profiles, one goal. Lower inference cost at planetary scale.

    Chapters:
    (00:00) Introduction
    (01:00) What Maia 200 is and who it’s for
    (02:45) Why custom silicon isn’t just a margin play
    (04:45) Inference as an efficient frontier
    (06:15) Portfolio thinking and heterogeneous infrastructure
    (09:00) Designing for LLMs and reasoning models
    (10:45) Why Maia avoids training workloads
    (12:00) Betting on inference in 2022–2023, before reasoning models
    (14:40) Hyperscaler advantage in custom silicon
    (16:00) Capacity allocation and internal customers
    (17:45) How third-party customers access Maia
    (18:30) Software, compilers, and time-to-value
    (22:30) Measuring success and the Maia 300 roadmap
    (28:30) What “30% better price-performance” actually means
    (32:00) Scale-up vs scale-out architecture
    (35:00) Ethernet and custom transport choices
    (37:30) On-die NIC integration
    (40:30) Memory hierarchy: SRAM, HBM, and locality
    (49:00) Long context and KV cache strategy
    (51:30) Wrap-up

    Voir plus Voir moins
    53 min
  • Can Pre-GPT AI Accelerators Handle Long Context Workloads?
    Jan 26 2026

    OpenAI's partnership with Cerebras and Nvidia's announcement of context memory storage raises a fundamental question: as agentic AI demands long sessions with massive context windows, can SRAM-based accelerators designed before the LLM era keep up—or will they converge with GPUs?

    Key Takeaways
    1. Context is the new bottleneck. As agentic workloads demand long sessions with massive codebases, storing and retrieving KV cache efficiently becomes critical.
    2. There's no one-size-fits-all. Sachin Khatti's (OpenAI, ex-Intel) signals a shift toward heterogeneous compute—matching specific accelerators to specific workloads.
    3. Cerebras has 44GB of SRAM per wafer — orders of magnitude more than typical chips — but the question remains: where does the KV cache go for long context?
    4. Pre-GPT accelerators may converge toward GPUs. If they need to add HBM or external memory for long context, some of their differentiation erodes.
    5. Post-GPT accelerators (Etched, MatX) are the ones to watch. Designed specifically for transformer inference, they may solve the KV cache problem from first principles.

    Chapters
    - 00:00 — Intro
    - 01:20 — What is context memory storage?
    - 03:30 — When Claude runs out of context
    - 06:00 — Tokens, attention, and the KV cache explained
    - 09:07 — The AI memory hierarchy: HBM → DRAM → SSD → network storage
    - 12:53 — Nvidia's G1/G2/G3 tiers and the missing G0 (SRAM)
    - 14:35 — Bluefield DPUs and GPU Direct Storage
    - 15:53 — Token economics: cache hits vs misses
    - 20:03 — OpenAI + Cerebras: 750 megawatts for faster Codex
    - 21:29 — Why Cerebras built a wafer-scale engine
    - 25:07 — 44GB SRAM and running Llama 70B on four wafers
    - 25:55 — Sachin Khatti on heterogeneous compute strategy
    - 31:43 — The big question: where does Cerebras store KV cache?
    - 34:11 — If SRAM offloads to HBM, does it lose its edge?
    - 35:40 — Pre-GPT vs Post-GPT accelerators
    - 36:51 — Etched raises $500M at $5B valuation
    - 38:48 — Wrap up

    Voir plus Voir moins
    38 min
Pas encore de commentaire