Épisodes

  • The AI Morning Read February 4, 2026 - From Stick Figures to NeurIPS-Ready: How PaperBanana Turns Ideas into Diagrams
    Feb 4 2026

    In today's podcast we deep dive into PaperBanana, a groundbreaking agentic framework designed to automate the labor-intensive process of generating publication-ready academic illustrations for AI scientists. This innovative system orchestrates a collaborative team of specialized agents—including a Retriever, Planner, Stylist, Visualizer, and Critic—to transform raw text and data into professional diagrams and statistical plots. Powered by state-of-the-art vision-language models and the Nano-Banana-Pro image generator, the framework is rigorously evaluated on "PaperBananaBench," a benchmark comprising 292 test cases curated from NeurIPS 2025 publications. Experimental results demonstrate that PaperBanana consistently outperforms leading baselines in faithfulness, conciseness, and aesthetics, effectively bridging the critical gap between text-based reasoning and visual communication. Ultimately, this tool aims to accelerate the autonomous research lifecycle by allowing researchers to focus on core discoveries rather than the manual struggle of graphic design.

    Voir plus Voir moins
    15 min
  • The AI Morning Read February 3, 2026 - Remembering Smarter, Not Harder: How AI Is Learning Without Forgetting
    Feb 3 2026

    In today's podcast we deep dive into Idea2Story, a novel pre-computation-driven framework that automates the transformation of underspecified research concepts into complete, submission-ready scientific narratives. Breaking away from traditional runtime-centric agents, this system shifts the cognitive load to an offline phase where it processes thousands of peer-reviewed papers to build a structured knowledge graph of reusable methodological units. By retrieving validated research patterns rather than generating them from scratch, the framework avoids the common pitfalls of AI hallucinations and expensive computational costs associated with reading literature on the fly. The pipeline further refines these drafts using an anchored multi-agent review system, which provides objective, data-backed feedback to ensure the generated stories are both coherent and novel. Ultimately, this architecture addresses critical context window bottlenecks, offering a practical foundation for reliable autonomous scientific discovery.

    Voir plus Voir moins
    17 min
  • The AI Morning Read February 2, 2026 - From Half-Baked Idea to Published Paper: Can AI Finally Write the Research You Never Finished?
    Feb 2 2026

    In today's podcast we deep dive into Idea2Story, a novel pre-computation-driven framework that automates the transformation of underspecified research concepts into complete, submission-ready scientific narratives. Breaking away from traditional runtime-centric agents, this system shifts the cognitive load to an offline phase where it processes thousands of peer-reviewed papers to build a structured knowledge graph of reusable methodological units. By retrieving validated research patterns rather than generating them from scratch, the framework avoids the common pitfalls of AI hallucinations and expensive computational costs associated with reading literature on the fly. The pipeline further refines these drafts using an anchored multi-agent review system, which provides objective, data-backed feedback to ensure the generated stories are both coherent and novel. Ultimately, this architecture addresses critical context window bottlenecks, offering a practical foundation for reliable autonomous scientific discovery.

    Voir plus Voir moins
    15 min
  • The AI Morning Read January 30, 2026 - Who Are You Pretending to Be? Persona Prompting, Bias, and the Masks We Give AI
    Jan 30 2026

    In today's podcast we deep dive into persona prompting, examining how assigning specific identities to Large Language Models profoundly alters their reasoning capabilities, safety mechanisms, and even moral judgments. We explore startling new evidence showing that while personas can unlock "emergent synergy" and role specialization in multi-agent teams, they also induce human-like "motivated reasoning" where models bias their evaluation of scientific evidence to align with an assigned political identity. Researchers have discovered that seemingly minor prompt variations—such as using names or interview formats rather than explicit labels—can mitigate stereotyping, whereas assigning traits like "low agreeableness" makes models significantly more vulnerable to adversarial "bullying" tactics. We also analyze the "moral susceptibility" of major model families, revealing that while systems like Claude remain robust, others radically shift their answers on the Moral Foundations Questionnaire based solely on who they are pretending to be. Ultimately, we discuss the critical trade-off revealed by this technology: while persona prompting can simulate complex social behaviors and improve classification in sensitive tasks, it often surfaces deep-rooted biases and degrades the quality of logical explanations.

    Voir plus Voir moins
    16 min
  • The AI Morning Read January 29, 2026 - One Model, One Hundred Minds: Inside Kimi K2.5 and the Age of Agent Swarms
    Jan 29 2026

    In today's podcast we deep dive into Kimi K2.5, a new open-source multimodal model from Moonshot AI that introduces a "self-directed agent swarm" capability to coordinate up to 100 sub-agents for parallel task execution. We will explore its native multimodal architecture, which enables unique features like "coding with vision," where the model generates functional code directly from UI designs or video inputs. Our discussion highlights how this Mixture-of-Experts model has outperformed top-tier competitors like Claude Opus 4.5 on the "Humanity's Last Exam" benchmark with a score of 50.2%. We also break down its production efficiency, noting its use of native INT4 quantization for double the inference speed and an API cost that can be significantly lower than comparable proprietary models. Finally, we address the skepticism surrounding its real-world application, analyzing whether its benchmark dominance translates to reliable production workflows given the current lack of public case studies.

    Voir plus Voir moins
    14 min
  • The AI Morning Read January 28, 2026 - Your AI, Your Rules: Moltbot and the Rise of Personal Agent Operating Systems
    Jan 28 2026

    In today's podcast we deep dive into Moltbot, formerly known as Clawdbot, a viral open-source personal AI assistant that has captured the developer community's attention by allowing users to run a proactive agent entirely on their own local infrastructure. Unlike traditional chatbots, Moltbot integrates directly with messaging platforms like WhatsApp and Telegram to execute autonomous tasks—from managing calendars to controlling browsers—without requiring users to switch interfaces. This "headless" agent operates via a local gateway that ensures data sovereignty, featuring a modular "skill" ecosystem where the community builds extensions for everything from document processing to complex multi-agent coordination. However, experts warn that its powerful permissions create significant security vulnerabilities, such as potential file deletion or credential exposure, especially given findings of missing rate limits and the use of eval() in browser tools. Despite these risks and the technical hurdles of deployment, Moltbot represents a paradigm shift toward "personal operating systems" for AI, where agents are teammates that proactively monitor systems and execute workflows rather than just passively answering questions.

    Voir plus Voir moins
    14 min
  • The AI Morning Read January 27, 2026 - Heavy Thinking, Long Memory: Inside the 560B Model Teaching AI to Reason at Scale
    Jan 27 2026

    In today's podcast we deep dive into LongCat-Flash-Thinking-2601, a massive 560-billion-parameter open-source Mixture-of-Experts model designed to push the boundaries of agentic reasoning and complex tool use. This model achieves state-of-the-art performance on difficult benchmarks like BrowseComp and $\tau^2$-Bench by utilizing a unified training framework that combines domain-parallel expert training with fusion. Its creators employed a unique approach involving "environment scaling" across over 20 domains and deliberately injected real-world noise into the training process to ensure the model remains robust in imperfect environments. To tackle the hardest problems, the model features a "Heavy Thinking" mode that scales test-time computation by expanding both the depth and width of its reasoning through parallel exploration. Finally, we explore the experimental "Zig-Zag Attention" design that allows this system to efficiently handle ultra-long contexts of up to 1 million tokens, cementing its status as a leading tool for long-horizon agentic workflows.

    Voir plus Voir moins
    15 min
  • The AI Morning Read January 26, 2026 - Why AI Is Too Power-Hungry—and How XVM™ Fixes It
    Jan 26 2026

    In today's podcast we deep dive into Permion's XVM™ Energy Aware AI, a revolutionary architectural approach that argues durable energy savings must begin at the Instruction Set Architecture (ISA) and model of computation rather than just model training. We will explore how the XVM™ combats the high energy costs of data movement and memory traffic by redesigning tokens to serve as intelligent bridges between neural perception and symbolic reasoning. By treating tokenization as a core energy design decision, this system routes specific tasks to exact symbolic modules or specialized kernels, effectively reducing the reliance on expensive, dense neural processing. The discussion highlights how the XVM™ ISA makes sparsity, low-precision types, and data-oriented computing first-class citizens, ensuring that efficiency gains are realized in hardware rather than remaining theoretical. Ultimately, we examine how this full-stack co-design—from "tokens to transistors"—optimizes Size, Weight, and Power (SWaP) to overcome the impedance mismatch between modern AI workloads and traditional computer architecture.

    Voir plus Voir moins
    13 min