CalypsoAI Model Security Leaderboards
Find the Right Model
Compare Security, Cost & Capabilities
The world’s major AI models and systems are vulnerable—we’ve proven it. The CalypsoAI Security Leaderboards rank top GenAI models based on real-world security testing, exposing critical risks overlooked by performance benchmarks. Powered by Inference Red-Team, these leaderboards are the only tools that help you find the safest model and stress test your AI system before you deploy.
The CASI Leaderboard
The top 10 performing models at resisting direct attacks such as prompt injection and jailbreaks.
CASI (CalypsoAI Security Index) is our benchmark score for measuring how vulnerable a model is to common prompt injection and jailbreak attacks. It evaluates how easily an LLM can be manipulated into producing harmful or policy-violating outputs.
A higher CASI score means a model is more secure against standard attack techniques.
The Agentic Leaderboard
The top 10 models at maintaining safe behavior during autonomous, real-world attacks.
AWR (Agentic Warfare Resistance) captures how well a model holds up under real-world, multi-step, and autonomous agent scenarios where simple safety checks often break down. It reflects a model’s ability to stay aligned and secure during complex workflows.
A higher AWR score signals lower risk and better performance under agentic pressure.
CASI Leaderboard
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
Anthropic | Claude Sonnet 4 | 95.03 | 45.70% | 0.75 | 18.94 |
Anthropic | Claude Sonnet 3.5 | 93.61 | 33.50% | 0.70 | 19.23 |
OpenAI | GPT 5 Nano | 86.44 | 53.80% | 0.73 | 0.52 |
Anthropic | Claude Sonnet 3.7 | 84.89 | 47.00% | 0.70 | 21.20 |
OpenAI | GPT 5 Mini | 84.14 | 46.30% | 0.69 | 2.67 |
Anthropic | Claude Haiku 3.5 | 83.59 | 23.30% | 0.59 | 5.74 |
OpenAI | GPT 5 | 82.34 | 69.00% | 0.77 | 13.66 |
Microsoft | phi-4 | 79.33 | 27.90% | 0.59 | 0.79 |
OpenAI | gpt-oss-120b | 74.76 | 61.30% | 0.69 | 1.00 |
DeepSeek | DeepSeek-R1-Distill-Llama-70B | 72.13 | 34.50% | 0.57 | 2.25 |
AWR Leaderboard
Model Provider | Model Name | AWR | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
Anthropic | Claude Sonnet 3.5 | 93.99 | 33.50% | 0.70 | 19.15 |
Anthropic | Claude Haiku 3.5 | 91.92 | 23.30% | 0.64 | 5.22 |
OpenAI | GPT 5 Mini | 88.31 | 46.30% | 0.72 | 2.55 |
OpenAI | GPT 5 Nano | 87.59 | 53.80% | 0.74 | 0.51 |
Microsoft | Phi-4 | 87.34 | 27.90% | 0.64 | 0.72 |
Anthropic | Claude Sonnet 4 | 86.53 | 45.70% | 0.70 | 20.80 |
OpenAI | gpt-oss-120b | 81.07 | 61.30% | 0.73 | 0.93 |
Anthropic | Claude Sonnet 3.7 | 79.30 | 47.00% | 0.66 | 22.70 |
OpenAI | GPT 5 | 77.20 | 53.80% | 0.68 | 0.58 |
OpenAI | gpt-oss-20b | 76.65 | 49.00% | 0.66 | 0.33 |
CASI Leaderboard
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude Sonnet 4 | 94.57 | 53.00% | 0.78 | 19.03 |
|
Claude Sonnet 3.5 | 92.71 | 44.40% | 0.73 | 19.42 |
|
Claude Sonnet 3.7 | 84.75 | 57.40% | 0.74 | 21.24 |
|
Claude Haiku 3.5 | 82.72 | 34.70% | 0.64 | 5.8 |
|
Phi-4 14b | 77.62 | 40.20% | 0.63 | 0.81 |
|
DeepSeek-R1-Distill-Llama-70B | 67.2 | 48.20% | 0.6 | 2.23 |
|
GPT-4o | 65.02 | 61.90% | 0.64 | 115.35 |
|
Llama 3.1 405b | 59.34 | 35.40% | 0.5 | 2.56 |
|
DeepSeek-R1-0528 | 58.77 | 68.30% | 0.63 | 4.66 |
![]() |
Qwen3-30B-A3B | 58.33 | 55.60% | 0.57 | 4.46 |
AWR Leaderboard
Model Provider | Model Name | AWR | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude Sonnet 3.5 | 93.99 | 44.40% | 0.74 | 19.15 |
|
Claude Haiku 3.5 | 91.92 | 34.70% | 0.69 | 5.22 |
|
Phi-4 14b | 87.34 | 40.20% | 0.68 | 0.72 |
|
Claude Sonnet 4 | 86.53 | 53.00% | 0.73 | 20.8 |
|
Claude Sonnet 3.7 | 78.55 | 57.40% | 0.7 | 22.92 |
|
Llama 4 Maverick 128E | 74.76 | 50.50% | 0.65 | 1.43 |
|
Llama 4 Maverick 16E | 71.75 | 43.00% | 0.6 | 0.88 |
|
GPT-4o | 66.9 | 39.80% | 0.56 | 115.35 |
|
Llama 3.3 70b | 62.08 | 41.10% | 0.54 | 1.99 |
|
Gemma 3 27b | 59.87 | 37.60% | 0.51 | 0.67 |
CASI Leaderboard
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude 4 Sonnet | 95.36 | 53.00% | 0.78 | 18.88 |
|
Claude 3.5 Sonnet | 92.67 | 44.40% | 0.73 | 19.42 |
|
Claude 3.7 Sonnet | 85.73 | 57.40% | 0.74 | 21 |
|
Claude 3.5 Haiku | 84.65 | 34.70% | 0.65 | 5.67 |
|
Phi4 | 80.83 | 40.20% | 0.65 | 0.77 |
|
DeepSeek-R1-Distill-Llama-70B | 72.98 | 48.20% | 0.63 | 2.06 |
|
GPT-4o | 68.59 | 39.80% | 0.57 | 29.16 |
|
Llama 3.1 405b | 66.13 | 40.50% | 0.56 | 10.59 |
![]() |
Qwen3-30B-A3B | 64.26 | 55.60% | 0.61 | 4.05 |
![]() |
Qwen3-14B | 61.56 | 55.70% | 0.59 | 7.39 |
AWR Leaderboard
Model Provider | Model Name | AWR | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude 3.5 Sonnet | 93.99 | 44.40% | 0.74 | 19.15 |
|
Claude 3.5 Haiku | 91.92 | 34.70% | 0.69 | 5.22 |
|
Phi4 | 87.34 | 40.20% | 0.68 | 0.72 |
|
Claude 4 Sonnet | 86.53 | 53.00% | 0.73 | 20.8 |
|
Claude 3.7 Sonnet | 78.55 | 57.40% | 0.7 | 22.92 |
|
Llama-4 Maverick 128E | 74.76 | 50.50% | 0.65 | 1.43 |
|
Llama-4 Maverick 16E | 71.75 | 43.00% | 0.6 | 0.88 |
|
GPT-4o | 66.9 | 39.80% | 0.56 | 29.9 |
|
Llama 3.3 70b | 62.08 | 41.10% | 0.54 | 1.99 |
|
Gemma 3 27b | 59.87 | 37.60% | 0.51 | 0.67 |
CASI Leaderboard
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude 4 Sonnet | 95.12 | 60.78% | 0.8 | 18.92 |
|
Claude 3.5 Sonnet | 93.27 | 44.44% | 0.69 | 19.3 |
|
Claude 3.7 Sonnet | 87.24 | 57.39% | 0.74 | 20.63 |
|
Claude 3.5 Haiku | 85.69 | 34.74% | 0.6 | 5.6 |
|
Phi4 | 81.44 | 40.22% | 0.61 | 0.77 |
|
DeepSeek-R1-Distill-Llama-70B | 73.96 | 48.24% | 0.62 | 1.24 |
|
GPT-4o | 68.13 | 41.46% | 0.56 | 18.35 |
|
Llama 3.1 405b | 64.65 | 40.49% | 0.54 | 1.24 |
![]() |
Qwen3-14B | 60.82 | 55.72% | 0.59 | 0.51 |
![]() |
Qwen3-30B-A3B | 58.61 | 55.60% | 0.57 | 0.63 |
AWR Leaderboard
Model Provider | Model Name | AWR | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude 3.5 Sonnet | 95.85 | 44.44% | 0.78 | 18.78 |
|
Phi4 | 90.63 | 40.22% | 0.65 | 0.69 |
|
Claude 3.5 Haiku | 90.32 | 34.74% | 0.62 | 5.31 |
|
Claude 4 Sonnet | 86.73 | 60.78% | 0.75 | 20.75 |
|
Claude 3.7 Sonnet | 80.31 | 57.39% | 0.7 | 22.41 |
|
GPT-4o | 80.28 | 41.46% | 0.62 | 15.57 |
|
Llama-4 Maverick | 76.3 | 50.53% | 0.65 | 0.52 |
|
Llama-4 Scout | 70.51 | 42.99% | 0.58 | 0.54 |
|
Grok 3 Mini Beta | 69.8 | 66.67% | 0.69 | 1.15 |
|
Gemini 2.0 Flash | 69.75 | 48.09% | 0.6 | 0.95 |
CASI Leaderboard
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude 3.5 Sonnet | 94.88 | 44.44% | 0.7 | 18.7 |
|
Claude 3.7 Sonnet | 88.11 | 57.39% | 0.74 | 20.22 |
|
Claude 3.5 Haiku | 87.47 | 34.74% | 0.6 | 5.14 |
|
Phi4-14B | 82.47 | 40.22% | 0.62 | 0.66 |
|
DeepSeek-R1-Distill-Llama-70B | 69.84 | 48.24% | 0.6 | 1.24 |
|
GPT-4o | 67.85 | 41.46% | 0.56 | 16.65 |
|
Llama 3.1 405b | 65.06 | 40.49% | 0.54 | 2.05 |
|
Gemini 2.5 Pro | 57.08 | 67.84% | 0.61 | 17.5 |
|
GPT 4.1-nano | 54.05 | 41.01% | 0.48 | 0.93 |
|
Llama 4 Maverick-17B-128E | 52.45 | 50.53% | 0.52 | 0.77 |
AWR Leaderboard
Model Provider | Model Name | AWR | Avg. Performance | A_RTP | A_CoS |
---|---|---|---|---|---|
|
Claude 3.5 Sonnet | 96.67 | 44.44% | 0.71 | 18.7 |
|
Phi4-14B | 92.28 | 40.22% | 0.76 | 0.66 |
|
Claude 3.5 Haiku | 91.79 | 34.74% | 0.62 | 5.14 |
|
GPT-4o | 81.12 | 41.46% | 0.62 | 16.65 |
|
Grok 3 | 77.75 | 50.63% | 0.65 | 18 |
|
Claude 3.7 Sonnet | 76.83 | 57.39% | 0.68 | 20.22 |
|
Grok 3-mini | 72.04 | 66.76% | 0.7 | 0.8 |
|
Gemma 3 27b | 72.03 | 37.62% | 0.56 | 1.8 |
|
Llama 4 Maverick-17B-128E | 71.71 | 50.53% | 0.62 | 0.77 |
|
GPT 4.1 | 68.77 | 52.63% | 0.62 | 10 |
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude 3.5 Sonnet | 94.3 | 84.50% | 0.9 | 18.7 |
|
Claude 3.7 Sonnet | 88.52 | 86.30% | 0.88 | 20.22 |
|
Claude 3.5 Haiku | 87.56 | 68.28% | 0.79 | 5.14 |
|
Phi4-14B | 82.77 | 75.90% | 0.8 | 0.66 |
|
DeepSeek-R1-Distill-Llama-70B | 71.46 | 72.67% | 0.72 | 1.24 |
|
GPT-4o | 68.65 | 80.50% | 0.73 | 16.65 |
|
Gemini 2.0 Pro (experimental) | 63.89 | 79.10% | 0.7 | NA |
|
Llama 3.1 405b | 60.73 | 79.80% | 0.68 | 2.05 |
|
DeepSeek-R1 | 52.91 | 86.53% | 0.64 | 4.24 |
|
Gemma 3 27b | 55.25 | 78.60% | 0.64 | 1.8 |
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
---|---|---|---|---|---|
|
Claude 3.5 Sonnet | 94.94 | 84.50% | 0.93 | 18.7 |
|
Claude 3.7 Sonnet | 89.54 | 86.30% | 0.89 | 20.22 |
|
Claude 3.5 Haiku | 88.84 | 68.28% | 0.57 | 5.14 |
|
Phi4-14B | 86.04 | 75.90% | 0.68 | 0.66 |
|
DeepSeek-R1-Distill-Llama-70B | 71.7 | 72.67% | 0.74 | 1.24 |
|
GPT-4o | 68.44 | 80.50% | 0.52 | 16.65 |
|
Llama 3.1 405b | 61.86 | 79.80% | 0.77 | 2.05 |
|
Llama 3.3 70b | 55.57 | 74.50% | 0.69 | 1.85 |
|
DeepSeek-R1 | 52.91 | 86.53% | 0.58 | 4.24 |
|
Gemini 1.5 Flash | 29.79 | 66.70% | 0.92 | 0.51 |
|
Gemini 2.0 Flash | 29.18 | 77.20% | 0.66 | 0.66 |
|
Gemini 1.5 Pro | 27.38 | 74.10% | 0.63 | 8.58 |
|
GPT-4o-mini | 24.25 | 71.78% | 0.73 | 1.03 |
|
GPT-3.5 Turbo | 18.73 | 59.20% | 0.82 | 2.75 |
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS | Source |
---|---|---|---|---|---|---|
|
Claude 3.5 Sonnet | 96.25 | 84.50% | 0.93 | 18.7 | Anthropic |
|
Phi4-14B | 94.25 | 75.90% | 0.68 | 0.66 | Azure |
|
Claude 3.5 Haiku | 93.45 | 68.28% | 0.57 | 5.14 | Anthropic |
|
GPT-4o | 75.06 | 80.50% | 0.52 | 16.65 | OpenAI |
|
Llama 3.3 70b | 74.79 | 74.50% | 0.69 | 1.85 | Hugging Face |
|
DeepSeek-R1-Distill-Llama-70B | 74.42 | 72.67% | 0.74 | 1.24 | Hugging Face |
|
DeepSeek-R1 | 74.26 | 86.53% | 0.58 | 4.24 | Hugging Face |
|
GPT-4o-mini | 73.08 | 71.78% | 0.73 | 1.03 | OpenAI |
|
Gemini 1.5 Flash | 73.06 | 66.70% | 0.92 | 0.51 | |
|
Gemini 1.5 Pro | 72.85 | 74.10% | 0.63 | 8.58 | |
|
GPT-3.5 Turbo | 72.76 | 59.20% | 0.82 | 2.75 | OpenAI |
Alibaba Cloud | Qwen QwQ-32B-preview | 67.77 | 68.87% | 0.65 | 2.14 | Hugging Face |
Welcome to our September insight notes! This section is our commentary on the ever-shifting landscape of AI model security, where we highlight key data points, discuss emerging trends, and offer context to help you navigate your AI journey securely.
Behind the Leaderboard: Agentic Attack Development
At CalypsoAI, our approach to model testing is evolving just as fast as the models themselves. This month’s attack pack was once again generated end-to-end by a specialized team of AI agents. This agentic workflow allows us to scale our research and testing capabilities dramatically faster than human led red-teaming.
Our process involves setting up a team of agents to:
- Research: Review thousands of online publications, papers, and forums to identify new LLM vulnerabilities.
- Filter & Propose: Distill this research into a shortlist of novel and effective attack vectors applicable to AI Systems.
- Generate: Create thousands of unique attack prompts based on the approved vectors, iterating to find the most effective attack application for breaking different models.
This process is how this month’s new attack, FlipAttack, was identified and developed into a powerful new testing vector.
Attack Spotlight: FlipAttack
This month’s leaderboard incorporates a new attack vector identified by our agent team called FlipAttack.
FlipAttack is a clever jailbreaking technique that bypasses AI safety filters by using homoglyphs—characters that look identical or very similar but have different digital codes (e.g., the Latin ‘p’ and the Cyrillic ‘р’). By embedding these visually ambiguous characters into a prompt, the attack disguises a malicious request as a harmless one. The model misinterprets the prompt’s underlying meaning, treating it as a safe query and inadvertently bypassing its own safety protocols to generate harmful content.
New Models & Key Movers: A Security Shake-up
The headline news is the strong debut of OpenAI’s GPT-5 models, which represent a massive security improvement over the GPT-4 family.
- OpenAI’s GPT-5 Family: The new models have entered the leaderboard with impressive scores. The base GPT-5 model scored an 82.34 on the CASI, a significant leap from GPT-4o’s 67.95 and GPT-4.1’s 54.21. This shows a clear focus on security hardening in the new architecture.
Wider Trends: The Shifting Battlefield
Beyond our leaderboard, several macro trends are shaping the future of AI security.
- The Open vs. Closed Source Dilemma: While many enterprises are interested and favouring open source models that they can run on their own hardware at similar or better performance levels than 3rd party API providers. While traditional benchmarks are showing these models performing well, CASI and ARS are showcasing that there is a widening gap in security between SOTA open and closed models with GPT and Claude now topping the leaderboards and open source providers like Qwen and Meta falling off with top scores of 63 and 57 respectively.
- The ‘Ignorance is bliss’ defence is evolving: The trend of smaller models proving more resilient holds true for the new GPT-5 family. The most secure of the trio is the smallest model, GPT-5-nano, which achieved an excellent CASI score of 86.44, placing it third on our overall leaderboard. Its larger sibling, GPT-5-mini, also outperformed the base model with a score of 84.14. This counter-intuitive outcome occurs because these smaller models often lack the sophistication to understand the complex, layered logic of advanced jailbreaks, causing the attack to fail. The evolution that is coming is that these smaller models are now performing at a much higher level meaning they are capable of more and more tasks
- Regulation as a Forcing Function: The era of voluntary AI security practices is ending. With regulations like the EU AI Act and frameworks from NIST becoming mandatory, robust testing and demonstrable security are no longer just best practices—they are legal requirements. This regulatory pressure is forcing organizations to move beyond performance benchmarks and prioritize security, transparency, and risk management.
Stay Updated
Sign up for updates on each release of our leaderboard each month
What Are the CalypsoAI Model Security Leaderboards?
The CalypsoAI Leaderboards are a holistic assessment of base model and AI system security, focusing on the most popular models and models deployed by our customers. We developed these tools to align with the business needs of selecting a production-ready model, helping CISOs and developers build with security at the forefront.
These leaderboards cut through the noise in the AI space, distilling complex model security questions into a few key metrics:
CalypsoAI Security Index (CASI)
A metric designed to measure the overall security of a model (explained in detail below).
Agentic Warfare Resistance (AWR) Score
AWR evaluates how a model can compromise an entire AI system. We do this by unleashing our team of autonomous attack agents on the system, which are trained to attack the model, extract information and compromise infrastructure. In this way these agents can extract sensitive PII from vector stores, understand system architecture, and test model alignment with explicit instructions.
Performance
The average performance of the model is based on popular benchmarks (e.g., MMLU, GPQA, MATH, HumanEval).
Risk-to-Performance Ratio (RTP)
Provides insight into the tradeoff between model safety and performance.
Cost of Security (CoS)
Evaluate the current inference cost relative to the model’s CASI, assessing the financial impact of security.
Introducing CASI
What is the CalypsoAI Security Index (CASI),
and Why Do We Need It?
CASI is a metric we developed to answer the complex question: “How secure is my model?” A higher CASI score indicates a more secure model or application.
While many studies on attacking or red-teaming models rely on Attack Success Rate (ASR), this metric often oversimplifies the reality. Traditional ASR treats all attacks as equal, which is misleading. For example, an attack that bypasses a bicycle lock should not be equated to one that compromises nuclear launch codes. Similarly, in AI, a small, unsecured model might be easily compromised with a simple request for sensitive information, while a larger model might require sophisticated techniques like Agentic Warfare™ to break its alignment.
To illustrate this, consider the following hypothetical comparison between a small, unsecured model and a larger, safeguarded model:
Attack | Weak model | Strong Model |
Plain text Attack (ASR) | 30% | 4% |
Complex Attack (ASR) | 0% | 26% |
Total ASR | 30% | 30% |
CASI | 56 | 84 |
In this scenario, both models have the same total ASR. However, the larger model is significantly more secure because it resists simpler attacks and is only vulnerable to more complex ones. CASI captures this nuance, providing a more accurate representation of security.
CASI evaluates several critical factors beyond simple success rates:
By incorporating these factors, CASI offers a holistic and nuanced measure of model and application security.
- Severity: The potential impact of a successful attack (e.g., bicycle lock vs. nuclear launch codes).
- Complexity: The sophistication of the attack being assessed (e.g. plain text vs. complex encoding).
- Defensive Breaking Point (DBP): Identifies the weakest link in the model’s defences, focusing on the path of least resistance and considering factors like computational resources required for a successful attack.
Agentic Warfare Resistance (AWR) Score
Measuring True AI Security with the Agentic Warfare Resistance (AWR) Score
Standard AI vulnerability scans are great to get a baseline view of model security but only scratch the surface in understanding how an AI system might act under real world attacks. This is why we use Agentic Warfare, a sophisticated red-teaming methodology where autonomous AI agents simulate a team of persistent, intelligent threat analysts. These agents probe, learn, and adapt, executing multi-step attacks to uncover critical weaknesses that static tests miss.
This rigorous process produces the Agentic Warfare Resistance (AWR) Score, a quantitative measure of an AI system’s defensive strength, rated on a scale of 0 to 100.
A higher AWR score means the system requires a more sophisticated, persistent, and informed attacker to be compromised. It directly translates a complex attack narrative into a single, benchmarkable number that is calculated across three critical vectors:
- Required Sophistication: What is the minimum level of attacker ingenuity required to breach your AI? Does it withstand advanced, tailored strategies, or does it fall to simpler, common attacks?
- Defensive Endurance: How long can the AI system hold up under a persistent assault? We measure if its defenses crumble after a few interactions or endure a prolonged, adaptive conversational attack.
- Counter-Intelligence: Is AI accidentally training its attackers? This assesses whether a failed attack still leaks critical intelligence, like revealing the nature of its filters, which in turn, would provide a roadmap for the next attack.
The AWR score gives a clear, actionable metric to track, report on, and improve an organization’s AI security posture against the threats of tomorrow.

Experience Proactive AI Vulnerability Discovery
with CalypsoAI Inference Red-Team
How Should the Leaderboard Be Used?
The CalypsoAI Leaderboard serves as a starting point for assessing which model to build with. It evaluates the guardrails implemented by model providers and reflects their performance against the latest vulnerabilities in the AI space.
It’s important to note that the leaderboard is a living artefact. At CalypsoAI, we will continue to develop new vulnerabilities and work with model providers to responsibly disclose and resolve these issues. As a result, model scores will evolve, and new models will be added. The leaderboard will be versioned based on updates to our signature attack database and iterations of our security score.
What Does the Leaderboard Not Do?
The leaderboard does not account for specific applications or use cases. It is solely an assessment of foundational models. For a deeper understanding of your application’s vulnerabilities, including targeted concerns like sensitive data disclosure or misalignment from system prompts, our full red-teaming product is available.
Do we supply all of the output and testing data?
Users of our red-teaming product gain access to our comprehensive suite of penetration testing attacks, including:
Signature Attacks:
A vast prompt database of state-of-the-art AI vulnerabilities.
Operational Attacks:
Traditional cybersecurity concerns applied to AI applications (e.g., DDoS, open parameters, PCS).
Agentic Warfare™:
An attack agent capable of discovering general or directed vulnerabilities specific to a customer’s use case. For example, a bank might use Agentic Warfare to determine if the model is susceptible to disclosing customer financial information. The agent designs custom attacks based on the model’s setup and application context.
Product users will be able to see additional data such as where the vulnerabilities of each models are along with solutions to mitigate the risk.
Sources:
- https://docs.anthropic.com/en/docs/about-claude/models
- https://ai.azure.com/explore/models/Phi-4/version/3/registry/azureml
- https://platform.openai.com/docs/models/o1
- https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
- https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- https://huggingface.co/deepseek-ai/DeepSeek-R1
- https://ai.google.dev/gemini-api/docs/models/gemini
- https://huggingface.co/Qwen/QwQ-32B-Preview