Adversarial Poetry | Jailbreaking AI With Poems

Échec de l'ajout au panier.

Veuillez réessayer plus tard

Échec de l'ajout à la liste d'envies.

Veuillez réessayer plus tard

Échec de la suppression de la liste d’envies.

Veuillez réessayer plus tard

Échec du suivi du balado

Ne plus suivre le balado a échoué

Adversarial Poetry | Jailbreaking AI With Poems

Écouter gratuitement

Voir les détails du balado

À propos de cet audio

Today we're looking at "adversarial poetry", a method to jailbreak AI.

The paper: https://arxiv.org/html/2511.15304v1

The academic paper and news report detail the discovery of adversarial poetry, a "universal single-turn jailbreak" mechanism that exploits a fundamental flaw in Large Language Model (LLM) safety alignment. This technique involves reformulating harmful requests into poetic verse, which successfully circumvents refusal mechanisms across 25 frontier models, including offerings from Google, OpenAI, and Anthropic.

The research found that using poetic framing led to a sharp increase in Attack Success Rate (ASR), jumping from an average of 8.08% to 43.07% for systematically transformed prompts. This stylistic manipulation was effective across numerous hazard categories, such as cyber-offense enablement and harmful manipulation, indicating that safety filters are narrowly optimized for conventional, prosaic language. Furthermore, the study revealed an inverse correlation between model size and robustness in some families, with certain smaller models displaying unexpected resilience against the poetic prompt attacks.

Additional sources:

https://www.planksip.org/platos-critique-of-poets-and-artists

#adversarialpoetry #artificialintelligence #ai #jailbreak #technews #technology

___

What do you think?

PS, make sure to follow my:

Main channel: https://www.youtube.com/@swetlanaAI

Music channel: https://www.youtube.com/@Swetlana-AI-Music

Hosted on Acast. See acast.com/privacy for more information.

Pas encore de commentaire