Adversarial Poetry | Jailbreaking AI With Poems
Échec de l'ajout au panier.
Échec de l'ajout à la liste d'envies.
Échec de la suppression de la liste d’envies.
Échec du suivi du balado
Ne plus suivre le balado a échoué
-
Narrateur(s):
-
Auteur(s):
À propos de cet audio
Today we're looking at "adversarial poetry", a method to jailbreak AI.
The paper: https://arxiv.org/html/2511.15304v1
The academic paper and news report detail the discovery of adversarial poetry, a "universal single-turn jailbreak" mechanism that exploits a fundamental flaw in Large Language Model (LLM) safety alignment. This technique involves reformulating harmful requests into poetic verse, which successfully circumvents refusal mechanisms across 25 frontier models, including offerings from Google, OpenAI, and Anthropic.
The research found that using poetic framing led to a sharp increase in Attack Success Rate (ASR), jumping from an average of 8.08% to 43.07% for systematically transformed prompts. This stylistic manipulation was effective across numerous hazard categories, such as cyber-offense enablement and harmful manipulation, indicating that safety filters are narrowly optimized for conventional, prosaic language. Furthermore, the study revealed an inverse correlation between model size and robustness in some families, with certain smaller models displaying unexpected resilience against the poetic prompt attacks.
Additional sources:
https://www.planksip.org/platos-critique-of-poets-and-artists
#adversarialpoetry #artificialintelligence #ai #jailbreak #technews #technology
___
What do you think?
PS, make sure to follow my:
Main channel: https://www.youtube.com/@swetlanaAI
Music channel: https://www.youtube.com/@Swetlana-AI-Music
Hosted on Acast. See acast.com/privacy for more information.