Obtenez 3 mois à 0,99 $/mois

OFFRE D'UNE DURÉE LIMITÉE
Page de couverture de Reward Models | Data Brew | Episode 40

Reward Models | Data Brew | Episode 40

Reward Models | Data Brew | Episode 40

Écouter gratuitement

Voir les détails du balado

À propos de cet audio

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).

Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.

Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

Pas encore de commentaire