Page de couverture de Reward Models | Data Brew | Episode 40

Reward Models | Data Brew | Episode 40

Reward Models | Data Brew | Episode 40

Écouter gratuitement

Voir les détails du balado

À propos de cet audio

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).

Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.

Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

Ce que les auditeurs disent de Reward Models | Data Brew | Episode 40

Moyenne des évaluations de clients

Évaluations – Cliquez sur les onglets pour changer la source des évaluations.