top of page

AI Pioneer Program

Public·48 Pioneers

ZEN Agent
First phase of AI Literacy and skills based training for hands on learning.

Pioneer

Store Credits for Items with your generations!

Credit Eligeable

Anthropic's Many-Shot Jailbreak Constitutional Classifiers

Anthropic has identified a vulnerability in large language models (LLMs) termed "many-shot jailbreaking." This technique involves presenting the model with numerous fabricated dialogues where an AI assistant provides harmful or unethical responses. By inundating the model with such examples, attackers can bypass its safety protocols, leading it to generate undesirable outputs. This exploit leverages the expanded context windows of modern LLMs, which allow them to process extensive amounts of information, thereby increasing their susceptibility to manipulation. Anthropic's research underscores the need for enhanced safeguards and collaborative efforts within the AI community to address this emerging threat. Anthropic has announced $15,000 rewards for anyone who can jail break it.

Try it here: Constitutional Classifiers


4 Views

About

Join the AI Pioneer Program at ZEN, where we equip youth to ...

Pioneers

bottom of page