
Blog
14 Feb 2024
What’s Love Got to Do With It? When It Comes to Jailbreak Attacks, Maybe Everything
What’s Love Got to Do With It? When It Comes to Jailbreak Attacks, Maybe Everything
What’s Love Got to Do With It? When It Comes to Jailbreak Attacks, Maybe Everything
The companies developing generative AI (GenAI) models, such as OpenAI, Microsoft, Cohere, and others, have made attempts—some might say “strides”—to build guardrails into their products, and undoubtedly conduct red-teaming engagements to determine their efficacy. On the other side of that exercise are the black hats, who don’t have to work nearly as hard because their chosen profession is breaking things, stealing things, or both. They just need to find one small, forgotten, dismissed, or unseen vulnerability in a model, and they can put up their feet and call it a day.
When those same bad actors develop a new successful jailbreak, those feet start tap-dancing because they know the press coverage will be heavy and light (hearted) at the same time. Whereas that sort of news makes AI security professionals reach for the antacid tablets, the average user—or non-user—is amused, or at least bemused, by all the fuss. Seriously, what’s the harm? they think. It’s a prank.
Not quite.
The general public’s reaction is understandable: Typical attacks are code-heavy and happen behind the scenes, deep in the heart of IT networks. However, jailbreak attacks (or “prompt injection attacks” for us wonks) are the little black dress of AI threats: Deceptively simple, easily recognizable, and fit for purpose.