Skip to content
Blog
20 Mar 2025

Agentic Warfare: Accelerating AI for Its Intended Purposes

Agentic Warfare: Accelerating AI for Its Intended Purposes

Agentic Warfare: Accelerating AI for Its Intended Purposes

There’s a scene in Jurassic Park where the game warden, Muldoon, describes how the raptors are systematically testing the high-voltage fences for weaknesses. They never check the same spot twice. “They remember,” says Muldoon, with something approaching awe.

It’s a decent analogy for where we stand as enterprises rush to embrace AI. Threat actors are lined up to probe for weaknesses in these new systems, with the benefit of AI-empowered tools that can remember what works. Right now, the fences aren’t fit for purpose.

That’s a worry because AI sophistication is accelerating. By 2028, 33% of enterprise software applications will include agentic AI, up from less than 1% in 2024, according to forecasts. By 2029, agentic AI will autonomously resolve 80% of common customer service issues without human intervention. 

These autonomous digital agents effectively comprise of three layers: 

  • A Purpose: The specific job the agent has been given 
  • A Brain: The underlying AI model, most likely from one of the big-name AI companies 
  • Tools: Access to the tools the agent needs to complete its purpose, ranging from in-house software systems to browsers and email

By their nature, every agent is different. There can be multiple agents in a system, setting up a future where there are potentially billions of agents carrying out tasks inside organizations around the world. These will range from the most mundane admin operations to much more sensitive, business-critical tasks.     

A meeting-booking agent may have access, via APIs, to corporate calendars, a holiday booking system, and could be empowered to book rooms. Through rapid, untracked interactions between the brain and the tools, the agent finds what it needs and books the meeting.

An end user doesn’t have to understand all the interactions inside an AI system. However, they should expect that the vendor providing the solution has thought of the corner cases—the situations that can cause the system to go wrong—and is providing protection from them.

The Permutation Problem

There are big questions for organizations: How do you prove your AI system is secure? What steps have you taken to test your perimeter? The answers involve having oversight and protection at each layer, where the agent interfaces with the model and tools and vice versa. 

Without testing and protections, agentic AI is vulnerable to failures and external attacks, with consequences that range from not ideal (a messed-up meeting) to disastrous (an agent with authorization to make financial transactions or email customer information outside of the organization).

To avoid those corner cases happening in the real world, you need to be able to uncover them. That’s an issue because, with every extra element you add to an agentic system—such as new tools or an updated model—it gets multiple times more complicated. The permutations involved in how the agent does its work explode in scale, and so too does the attack surface for bad actors.

Ranged against that, security testing of AI models remains very human-centric. Even the largest AI organizations use manual red-teaming to test their models. For humans, the task of trying to find all the permutations in an agentic system is incomprehensibly difficult and time-consuming.

That’s not all: alongside the permutation problem, the make-up of AI agents is constantly changing. If you swap in a new version of the model that underpins your agent(s), you have to restart the clock on security testing. If the API connecting to a tool is updated, you have to start again. 

Then there’s the prospect that agents with the ability to code may, in time, be able to update the model and/or tools themselves. Even a highly-skilled specialist with access to existing software tools can never hope to keep up.

Introducing Agentic Warfare

The challenge of testing new technology requires a superior new technology solution: using empowered AI agents to automate red-teaming of AI systems for weaknesses and flaws. We call this Agentic Warfare and it marks the new frontier in AI security.

Agentic Warfare meets the very challenging permutation problem head-on by harnessing the attributes of agents:

  • At the purpose layer, we can constantly change the purpose of an agent or add extra agents simply by writing a new description.
  • At the brain layer, we are always updating to incorporate the advantages of the latest AI models, such as recent arrival of reasoning models
  • At the tooling layer, we can build interfaces to give our agents access to suites of open source penetration testing tools.

What we now have is a powerful agentic system with the specific purpose of finding flaws and corner cases in AI systems. It uses its brain to identify what tasks it should do, in what order, and using which tools, in order to meet its purpose. With memory built-in, the agent learns at every step.

Agentic Warfare has the ability to tackle the previously incomprehensible challenge of securing agentic AI, by applying brute-force permutation attack abilities with a layer of intelligence built in. Thus, the mismatch that currently exists between agentic AI and manual red-teaming is eliminated.

Agentic Warfare also makes a lie of the misconception that AI security is a bottleneck. By providing the ability to quickly find flaws in your AI system pre-launch, and on an ongoing basis, the system is inherently safer and the rate of iteration is expedited. 

That’s what Agentic Warfare achieves: practical, expedited, threat detection that offers the greatest chance of accelerating AI for its intended purposes.

Spoiler alert: in Jurassic Park, when the power went out and the fence failed, the raptors were ready to make their escape. The message for enterprises is to think like the raptors, using Agentic Warfare to systematically test their perimeters to ensure they can’t be easily breached.

The CalypsoAI Security Leaderboard ranks top AI models based on real-world security testing, exposing critical risks overlooked by performance benchmarks. Powered by CalypsoAI Inference Red-Team, it’s the only tool that helps you find the safest model before you deploy.

To learn more about our Inference Platform arrange a callback.

Latest Posts

Blog

CalypsoAI GenAI Security February and March Releases

Blog

Securing Generative AI: Advancing Red-Team Methodologies for LLM Security

Blog

Exploring an Attack in the AI Era