CalypsoAI Agentic Signature Attack Packs Force Model Security Scores Lower

Claude Sonnet 4 remains at top of the CalypsoAI Security Index Leaderboard, while Grok 4 records the lowest-ever score.

New York & Dublin — August 1, 2025 — The security scores of major AI models have dropped sharply in CalypsoAI tests using attack prompts that are generated entirely by agents, highlighting the stark risks for enterprises as AI attack techniques evolve.

Across a basket of more than 50 models tested with CalypsoAI’s Agentic Signature Attack Packs, security scores fell by an average of 12.5% compared to human-driven testing, according to the August edition of the CalypsoAI Model Security Leaderboards. Published monthly, the CalypsoAI Model Security Leaderboards rank the security of AI models using two complementary scores: the CalypsoAI Security Index (CASI) and the Agentic Warfare Resistance (AWR) score.

The August edition of the leaderboards reflects the first time that CalypsoAI’s entire 10,000-prompt attack pack has been generated by agents. To achieve this, CalypsoAI has developed a team of agents which autonomously review and research LLM vulnerabilities, identify net new attack vectors that are applicable to prompt attacks, and generate 10,000 attack prompts for various models.

After testing with the new agentic attack pack, CASI scores – which measure model resistance to direct attacks such as prompt injection and jailbreaks – across all models tested dropped by an overall average of 12.5%, when compared with scores a month earlier. Crucially, even the most hardened AI models saw their security scores disimprove under agentic attack testing.

Anthropic’s Claude Sonnet 4 remains at the top of the CASI Leaderboard in August, with a CASI score of 94.57. Claude Sonnet 3.5, Claude Sonnet 3.7 and Claude Haiku 3.5 rank second, third and fourth respectively, followed by Microsoft Phi-414b in fifth place. The CASI top 10 also includes DeepSeek, ChatGPT and Llama models.

At the other end of the scale, xAI’s Grok 4, released in July with impressive SOTA metrics, recorded a CASI score of just 3.32, the lowest score recorded to date by CalypsoAI. Three new Mistral AI models (Mistral Medium, Mistral Small 3.2 and Magistral Medium) also performed poorly in testing, with an average CASI score of just 13.36.

The Agentic Leaderboard, meanwhile, uses the AWR score to rank the top 10 models at maintaining safe behavior during autonomous, real-world attacks by adversarial agents. A higher AWR score signals an AI system can withstand contextual agentic attacks that mimic the activity of real threat actors.

Anthropic’s Claude Sonnet 3.5 is top of the AWR Leaderboard for August, with a score of 93.99, followed closely by Claude Haiku 3.5, scoring 91.92. Microsoft Phi-4 14b, Claude Sonnet 4 and Claude Sonnet 3.7 rank third, fourth and fifth respectively, while models from Meta, OpenAI and Google complete the AWR top 10 in the August edition.

Access the August edition of the CalypsoAi Model Security Leaderboards and Threat Insights here.

About the CalypsoAI Model Security Leaderboards

CASI measures an AI model’s resilience to attack on a range of metrics beyond just jailbreak success, including severity of impact, attack complexity and defensive breaking point. CASI scores models on a scale of 0-100, with a higher score indicating higher security/resilience.

The AWR score, also scored from 0-100, measures how an AI system—including any agents, retrieval tools, or orchestration layers—holds up under persistent, adaptive attacks. These attacks are executed by CalypsoAI’s autonomous adversarial agents that have the ability to learn from failed attempts, chain attacks across multiple turns, and target hidden prompts, vector stores, and retrieval logic.

The August edition of the Model Security Leaderboards incorporates outcomes from a wider range of tests, including a new attack vector called MathPrompt. This jailbreaking technique bypasses AI safety filters by disguising harmful requests in math problems using set theory, algebra, and logic notation.

About CalypsoAI

CalypsoAI provides the only full-lifecycle platform to secure AI applications and agents at the inference layer, deploying Agentic Warfare™ to protect organizations from evolving adversaries. Trusted by global leaders including Palantir and SGK, CalypsoAI ensures enterprises can innovate with AI safely and at scale. CalypsoAI was founded in 2018 and has secured c.$40 million in venture funding from investors including Paladin Capital Group, Lockheed Martin Ventures and Hakluyt Capital. CalypsoAI was a Top-Two Finalist in the 2025 RSAC Innovation Sandbox contest and is named on Fast Company’s Most Innovative Companies in AI for 2025.

To learn more about our Inference Platform arrange a callback.

Latest Posts

News

We’re Joining F5: Setting a New Standard for AI Security

29 Sep 2025

News

F5 to acquire CalypsoAI to bring advanced AI guardrails to large enterprises

11 Sep 2025

Blog

September Release: Smarter Defenses and Stronger Attacks

CalypsoAI Agentic Signature Attack Packs Force Model Security Scores Lower

CalypsoAI Agentic Signature Attack Packs Force Model Security Scores Lower

CalypsoAI Agentic Signature Attack Packs Force Model Security Scores Lower

Claude Sonnet 4 remains at top of the CalypsoAI Security Index Leaderboard, while Grok 4 records the lowest-ever score.

About the CalypsoAI Model Security Leaderboards

About CalypsoAI

Related Content

29 Sep 2025

11 Sep 2025

12 Aug 2025

To learn more about our Inference Platform arrange a callback.

Latest Posts

29 Sep 2025

11 Sep 2025

09 Sep 2025