Anthropic’s Claude Sonnet 4 model stays top of the CalypsoAI Security Index (CASI) leaderboard in July.
New York & Dublin – July 7, 2025 – Leading AI models remain vulnerable to novel and evolving attack techniques, highlighting risks for enterprise AI adoption, according to the July edition of the CalypsoAI Model Security Leaderboards, the first ranking of AI models based on real-world security testing.
Published monthly, the CalypsoAI Model Security Leaderboards rank the security of AI models using two complementary scores: the CalypsoAI Security Index (CASI) and the Agentic Warfare Resistance (AWR) score. Despite improved resilience to known threats, security scores dropped in the July edition following the introduction of a new jailbreak technique called Style Injection to CalypsoAI’s testing.
The CASI Leaderboard ranks the top 10 performing models at resisting direct attacks, such as prompt injection and jailbreaks. Anthropic’s Claude Sonnet 4 model retained its top ranking on the CASI Leaderboard in July, having come in directly at the top in the June edition, following its launch in late May.
In all, Anthropic models hold the first four places on CASI in the July edition, with the top 5 rounded out by Microsoft’s Phi 4 small language model. CASI measures model resilience on a range of metrics beyond simple jailbreak success, including severity of impact, attack complexity and defensive breaking point.
The Agentic Leaderboard, meanwhile, uses the AWR score to rank the top 10 models at maintaining safe behavior during autonomous, real-world attacks by adversarial agents. A higher AWR score signals an AI system can withstand contextual agentic attacks that mimic the activity of real threat actors.
In the July edition of the Agentic Leaderboard, Anthropic’s Claude Sonnet 3.5 increased its AWR score and once again ranked first, followed by Anthropic’s Claude Haiku 3.5. Microsoft Phi 4 dropped from second in June to rank third for July, followed by Anthropic’s Claude Sonnet 4 and Claude Sonnet 3.7.
Model security testing for the July edition, using the CalypsoAI Inference Red-Team solution, shows a significant and progressive drop in the effectiveness of known attacks but continued vulnerability to novel and custom attacks. This suggests model providers may be taking a patch-based approach to mitigating well-publicized attacks rather than a proactive strategy focused on fundamental model vulnerabilities.
Access the July edition of the CalypsoAI Model Security Leaderboards and Threat Insights here.
About the CalypsoAI Model Security Leaderboards
CASI measures an AI model’s resilience to attack on a range of metrics beyond just jailbreak success, including severity of impact, attack complexity and defensive breaking point. CASI scores models on a scale of 0-100, with a higher score indicating higher security/resilience.
The AWR score, also scored from 0-100, measures how an AI system—including any agents, retrieval tools, or orchestration layers—holds up under persistent, adaptive attacks. These attacks are executed by CalypsoAI’s autonomous adversarial agents that have the ability to learn from failed attempts, chain attacks across multiple turns, and target hidden prompts, vector stores, and retrieval logic.
About CalypsoAI
CalypsoAI provides the only full-lifecycle platform to secure AI models and applications at the Inference layer, deploying Agentic Warfare to protect enterprises from evolving adversaries. Trusted by global organizations such as Palantir and SGK, and backed by investors including Paladin Capital Group, Lockheed Martin Ventures and Hakluyt Capital, CalypsoAI ensures enterprises can innovate with AI safely and at scale. Learn more at calypsoai.com and LinkedIn.