Skip to content

Join us at InfoSec Europe | June 3 - 5 | London

Learn more
Blog
07 May 2025

Securing the Agentic Era: From Hype to High Stakes

Securing the Agentic Era: From Hype to High Stakes

Securing the Agentic Era: From Hype to High Stakes

As agentic hype solidifies into reality, organizations around the world are constructing use cases for AI agents to work within their walls. By connecting foundation models to tools and data sources, and giving them tasks to complete autonomously, enterprises hope to cut costs and operate more efficiently. 

In the first instance, CISOs are aiming to identify the agentic equivalents of robotic lawnmowers and floor cleaners, which offer a clear benefit with a relatively low level of risk. Customer service is an obvious early use case, serving as a test-bed for AI-driven solutions from companies such as Intercom, Zendesk, and ServiceNow, as well as in-house systems. 

Gartner forecasts that, by 2029, agentic AI will autonomously resolve 80% of common customer service issues without human intervention, leading to a 30% reduction in operational costs. With numbers like that, it is only a matter of time until agents are tackling other business tasks at a significant volume, ultimately being embedded in core product functions. 

Meanwhile, a pair of revised American policies on the use of AI in federal agencies offer a useful glimpse into the AI future. Memorandum M-25-21 from the White House Office of Management and Budget (OMB), published on April 7, states: “Agencies should focus on improving mission effectiveness through the use of AI by building upon their existing capabilities to drive responsible AI innovation, strengthen their AI and AI-enabling talent, and improve their ability to develop and procure AI.” In enterprises, the objectives are surely the same. 

The memo mandates all federal agencies to retain or designate a Chief AI Officer with “significant expertise in AI” within 60 days and publish a strategy for advancing AI within 180 days. It also includes a list of categories where using AI as the primary basis for a decision or action is considered “high impact”. As a sample, it includes:

  • Safety-critical functions of infrastructure or government facilities, including emergency services, food safety and traffic control systems; 
  • In healthcare, medically-relevant functions of medical devices; patient diagnosis or treatment; the control of health-insurance costs and underwriting;
  • Control of access to, or the security of, government facilities;
  • In law enforcement, producing risk assessments about individuals; identification of suspects; forecast of crime; tracking people and vehicles; application of biometric identification; social media monitoring; detection of weapons or violent activity; determinations related to detention, sentencing, parole or bail 
  • Preparation or adjudication of risk assessments on foreign nationals seeking access to the US, including related to immigration, asylum, detention, or travel approval;
  • Ability to apply for, or adjudication of, requests for critical federal services, processes, and benefits, including loans and access to public housing; 
  • Determining the terms or conditions of federal employment, including pay or promotion, performance management, hiring or termination, or disciplinary action.

Caution: Agents at Work

The non-exhaustive list suggests the White House sees these, and other high-consequence activities, as realistic use cases for AI. While the memo does not refer specifically to AI agents, the bulk of the use cases – such as making decisions about asylum or access to housing, loans or benefits – would clearly involve models being connected to other tools and databases, likely containing sensitive and personally identifiable information. 

That’s an issue, as all the leading AI models, which act as the ‘brain’ of AI agents, have been proven to have vulnerabilities – something that, at the enterprise level, must be seriously considered when leveraging AI for business activity. The latest GenAI Security Incident & Exploit Round-Up from OWASP, for instance, highlights bad actors jailbreaking models and using AI for phishing and voice-cloning scams. Add in interconnectivity and interoperability and you get new attack vectors that raise the security stakes considerably. 

Compounding the issue, the threat level is directly correlated with the number of tools that are linked to the agent(s). If an agent has database access, all the databases in the organization are potentially vulnerable; if it has email and browser access, the email system could potentially be wired up to a malicious agent that can relentlessly attempt to break into the organization by phishing. (Unlike human attackers, agents never tire and can respond in real-time, raising the bar for believeability.)

Now consider the combination factor: an agent that has both database access and email access could be weaponized by a bad actor to exfiltrate data or carry out an infiltration attack, injecting new data into the system. In addition, there is the risk that insiders may misuse the agent – accidentally or maliciously — causing even more damage, since they are already inside the organization's defenses. 

As models are increasingly commodified, it’s clear that security, rather than performance, is the key differentiator when assessing workplace AI systems and agents. A high-performing system that is vulnerable to attack or exploitation has no place in an enterprise or federal environment.

For its part, the memo directs federal agencies to deploy “trustworthy AI” and implement minimum risk management practices for high-impact AI. If the high-impact AI is not performing at an appropriate level, agencies must have a plan to discontinue its use until actions are taken to achieve compliance. Enterprises looking at agent use should have similar plans. 

A New Security Response

However, effective risk management must start before the event rather than focusing on its aftermath. Understanding the agent threat is the first step to dealing with it. Proper measurement of the threat level of any given agent is essential – firstly, to decide if any action is required; and, secondly, to make an informed decision about appropriate action. 

If the security response is overcooked, it can stifle innovation; undercooked and it will fail to effectively reduce the threat. However, traditional tools such as manual red teaming and blunt metrics like Attack Success Rate – which treats all attacks as equal – are inadequate to deal with threats posed by AI, given its ability to learn and adapt. 

A complex new threat vector demands a new security response. In these circumstances, the best way for organizations to effectively scale up their security is to properly understand the type and complexity of the attack, and employ appropriate measures. 

This is best achieved by fighting fire with fire, using intelligent, customizable AI agents to simulate adversarial interactions and automate red-teaming of AI systems before launch. This approach, which we call Agentic Warfare, equips enterprises with an army of security agents – allowing security leaders to quantify the threat level from AI and remediate accordingly.

As well as providing the closest equivalent of real-world activity, automated red-teaming closes the time window that traditionally favors attackers, providing enterprises with the most effective security coverage in the quickest time possible. At CalypsoAI, we use Agentic Warfare to develop our unique security scoring metrics, the CalypsoAI Security Index (CASI) and Agentic Warfare Resistance (AWR), which assess the security of AI models and inform the rankings on our Model Security Leaderboards.

Of course, security is never a ‘one-and-done’, and agents are no different. Continuous monitoring is crucial to ensuring the threat exposure remains reduced and an appropriate security posture is maintained. As agentic AI gains ground, a proactive security approach offers the best chance of safely embracing agents and accelerating AI for its intended purposes.

To learn more about our Inference Platform arrange a callback.

Latest Posts

AI Inference Security Project

Handbook: The GenAI Policy Handbook 2025

Get practical guidance, frameworks, and templates to build safe, effective GenAI policies.
Uncategorized

5 Inference Security Risks Security Leaders Need on Their Radar

Blog

Self-Hosting vs. Models-as-a-Service: The Inference Security Tradeoff