Page de couverture de (FM-AMZN) Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

(FM-AMZN) Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

(FM-AMZN) Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Écouter gratuitement

Voir les détails du balado

À propos de cet audio

Discover the revolutionary Proposer-Agent-Evaluator (PAE) system, developed by Amazon Science, which empowers foundation model agents to autonomously discover and practice skills in the wild. This novel approach overcomes the significant challenge of manually specifying an agent’s vast skill repertoire through human-annotated instructions, which severely limits scalability. PAE operates by having a context-aware task proposer generate instructions based on website information, an agent policy attempting these tasks, and an autonomous VLM-based evaluator providing reward signals for policy refinement via Reinforcement Learning (RL).

The system excels in challenging vision-based web navigation, demonstrating substantial improvements in zero-shot generalization to unseen tasks and websites (around 50% relative improvement) on real-world benchmarks like WebVoyager and WebArena. PAE enables agents to perform diverse goal-directed tasks, from finding directions to buying specific items online, without human supervision. Despite its advancements, current PAE models may still lag behind state-of-the-art proprietary models in complex reasoning, and their performance on dynamic live websites can vary. Nevertheless, this breakthrough by Amazon Science paves the way for more capable open-source foundation model agents.

Paper link: https://assets.amazon.science/74/38/965b25dc4a98b48186022a8588d3/proposer-agent-evaluator-pae-autonomous-skill-discovery-for-foundation-model-internet-agents.pdf

Ce que les auditeurs disent de (FM-AMZN) Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Moyenne des évaluations de clients

Évaluations – Cliquez sur les onglets pour changer la source des évaluations.