Épisodes

  • S2E4: Data Augmentation
    Sep 2 2025

    Discover how data augmentation is revolutionizing computer vision, offering a powerful solution to the perennial challenge of data scarcity in training deep neural networks. This process involves artificially generating new, plausible training samples by applying transformations to existing data, thereby enriching datasets and providing the necessary volume and variety for models to learn more effectively. Beyond merely increasing data quantity, augmentation acts as a crucial regularization technique, combating overfitting by forcing models to learn abstract, robust features instead of memorizing training specifics, leading to improved generalization and robustness. From simple geometric and color alterations to advanced methods like generative adversarial networks (GANs) and learned augmentation policies, these techniques are indispensable across critical domains such as autonomous driving, medical imaging, and retail analytics, enabling the development of more reliable and accurate AI systems.

    Voir plus Voir moins
    30 min
  • S2E3: Datasets
    Aug 25 2025

    This episode delves into the unsung heroes of the artificial intelligence revolution: the foundational datasets that taught computers to "see". We explore the evolutionary journey of computer vision through four landmark datasets: PASCAL VOC, which standardized object detection and established common benchmarks; ImageNet, whose unprecedented scale ignited the deep learning revolution and popularized transfer learning; COCO (Common Objects in Context), which advanced the field towards complex scene understanding with rich annotations like instance segmentation and keypoint detection; and Cityscapes, a critical benchmark for achieving pixel-perfect semantic understanding in dense urban environments for autonomous driving. Discover how these meticulously curated collections of images are not just passive data, but active instruments of scientific progress, defining challenges, measuring advancement, and ultimately catalyzing the innovations that power everything from self-driving cars to augmented reality and medical diagnostics in our daily lives.

    Voir plus Voir moins
    22 min
  • S2E2: Annotation tools
    Aug 19 2025

    This episode delves into the foundational role of data annotation in teaching machines to "see" and understand the visual world, a critical step for nearly all supervised machine learning projects in computer vision. We explore how meticulously labeled datasets, known as ground truth, serve as the "answer key" that determines the accuracy and reliability of AI models. The discussion then compares three prominent computer vision annotation tools: LabelImg, presented as the ideal tool for learning due to its simplicity for basic bounding box tasks; CVAT, described as the professional platform for annotation, renowned for its robust support for complex data types like video and 3D LiDAR, collaborative features, and self-hosting capabilities suitable for large-scale, specialized teams; and Roboflow, an integrated ecosystem for deployment that streamlines the entire machine learning lifecycle from annotation and data augmentation to one-click model training and deployment, emphasizing speed and convenience for businesses focused on rapid iteration. Finally, we illustrate the real-world impact of these tools through diverse applications, from autonomous vehicles and retail shelf monitoring to medical image diagnostics, highlighting how the choice of tool aligns with specific project needs and industry demands.

    Voir plus Voir moins
    20 min
  • S2E1: Computer Vision Libraries
    Aug 13 2025

    In this episode, we delve into the fascinating world of computer vision, the field that empowers machines to interpret and understand visual information, bridging the gap between raw pixel data and high-level human understanding. We explore its two fundamental approaches: the classical, algorithm-driven method and the modern, data-driven deep learning method. Our journey begins with OpenCV, the venerable, high-performance, and open-source library that serves as the foundational toolkit for classical computer vision and is crucial for image preprocessing and real-time tasks. We then pivot to the deep learning revolution, introducing tensors as the universal language of data and Convolutional Neural Networks (CNNs) as the architecture that automatically learns features directly from data. We compare the two deep learning powerhouses: PyTorch, known for its flexibility, eager execution, and dominance in research, and TensorFlow, a comprehensive, end-to-end platform designed for scalability and production-readiness with its user-friendly Keras API. Crucially, we uncover how these powerful tools are not mutually exclusive but often used in synergy within complete computer vision pipelines, with OpenCV handling efficient data acquisition and post-processing, while PyTorch or TensorFlow manage complex deep learning inference. Finally, we bring these concepts to life by exploring their transformative real-world applications, from smartphone face unlock and social media filters to the sophisticated perception systems in autonomous vehicles and the innovative automation seen in retail and manufacturing.


    See: https://tinyurl.com/SM-S2E1

    Voir plus Voir moins
    33 min
  • S1Bonus: SciFi to Reality
    Aug 5 2025

    Step into a world where machines truly see, bridging the gap between cinematic fantasy and scientific reality. This episode begins with the captivating gaze of Ava from Ex Machina, exploring the profound allure of a "seeing machine" that leverages visual data to manipulate and evoke sympathy, representing the ultimate fantasy of computer vision. We then deconstruct the technology, revealing how real-world algorithms enable machines to interpret and understand the visual world by translating pixels into coherent concepts and identifying statistically significant patterns. Discover how the "algorithmic brain" of modern computer vision, particularly through Convolutional Neural Networks (CNNs), learns to perform tasks by analyzing vast quantities of data and recognizing patterns, a process fundamentally different from traditional programming. From this foundation, we explore the pervasive applications of computer vision in your daily life and across major industries: from unlocking smartphones and enabling augmented reality filters to acting as the "eyes" of self-driving cars for collision avoidance and lane detection, augmenting human expertise in medical imaging for cancer detection, and powering the seamless experience of cashier-less retail stores. Finally, we confront the profound ethical and technical challenges arising from granting machines the power to see, including their vulnerability to adversarial attacks, the critical issue of algorithmic bias stemming from training data, and urgent questions surrounding privacy in an age of pervasive surveillance.


    see also: https://tinyurl.com/SM-S1-Bonus

    Voir plus Voir moins
    24 min
  • S1E8: Computer Vision Challenges
    Aug 2 2025

    This episode delves into the critical challenges hindering the widespread and reliable deployment of computer vision (CV) systems in the real world. We explore occlusion, where objects are partially or completely hidden, making it difficult for models to "see" and interpret scenes accurately. The concept of generalization is examined, highlighting how models often fail to perform reliably on new, unseen data due to "domain shift," such as changes in weather, lighting, or geographical location from their training environment. A significant focus is placed on bias, revealing how inherent prejudices in training data can lead to systematically unfair outcomes in CV applications, particularly in facial recognition technology, and the serious societal implications that arise. Finally, we discuss the practical hurdles of real-world deployment, including computational constraints, data and concept drift, and environmental variability, emphasizing that a successful CV product is a complex, evolving system requiring continuous management and maintenance. Understanding these interconnected challenges is crucial for building robust, ethical, and trustworthy AI.


    see also:

    https://tinyurl.com/SM-S1E8-01

    https://tinyurl.com/SM-S1E8-02

    Voir plus Voir moins
    1 h et 2 min
  • S1E7: Segmentation
    Jul 26 2025

    This episode delves into image segmentation, a foundational computer vision task that teaches machines to understand the visual world at a pixel level, moving beyond simple classification or bounding boxes. We explore the critical distinctions within this field: semantic segmentation, which assigns a class label to every pixel to understand broad regions like "road" or "sky", and instance segmentation, which goes a step further by identifying and precisely outlining each individual object within a class, such as "car 1" versus "car 2". We'll uncover two canonical deep learning architectures that power these capabilities: U-Net, known for its U-shaped encoder-decoder design and crucial skip connections that enable precise boundary localization, particularly in medical imaging applications despite limited data; and Mask R-CNN, a powerful framework that extends object detection to generate pixel-perfect masks for every instance by leveraging a two-stage "detect-then-segment" approach and innovations like ROIAlign. Finally, we'll see how these converge in panoptic segmentation for a truly comprehensive scene understanding, enabling transformative applications from autonomous vehicles and medical diagnostics to automated retail and robotics.


    see:

    https://tinyurl.com/SM-S1E7-1

    https://tinyurl.com/SM-S1E7-2

    Voir plus Voir moins
    24 min
  • S1E5: Object Detection
    Jul 18 2025

    Dive into the fascinating world of computer vision with a deep exploration of object detection models, the technology that teaches machines to "see" and understand the world around them. This episode breaks down the core concepts, from the fundamental task of distinguishing multiple objects and pinpointing their locations within an image, to the sophisticated architectures that power this capability. We'll uncover the "Great Divide" in object detection, contrasting the accuracy-focused two-stage detectors like Faster R-CNN, which meticulously propose and then classify regions of interest, with the speed-prioritizing one-stage detectors such as YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector), which process images in a single, unified pass. Discover the crucial trade-offs between speed and accuracy, and how these models handle challenges like detecting small objects. Finally, we'll journey through compelling real-world applications, revealing how these powerful models are transforming everything from autonomous driving and medical imaging to retail automation, sports analytics, and smart city surveillance, enabling groundbreaking advancements in AI's ability to perceive our physical world.


    see also:

    https://tinyurl.com/SM-S1E5-1

    https://tinyurl.com/SM-S1E5-2

    Voir plus Voir moins
    16 min