A New Attack Surface
AI Security refers to the strategic implementation of robust measures and policies to protect your organization's AI systems, data, and operations from unauthorized access, tampering, malicious attacks, and other digital threats. It is critical for any organization deploying AI tools to include security in its AI use strategy because every new component brought into a digital system, including AI-powered applications, increases the available ‘attack surface’ for threat actors to infiltrate or cause harm to the system. Securing AI-driven additions to your digital infrastructure, whether those are statistical or large language models (LLMs), such as ChatGPT and others, is vital for maintaining the integrity, privacy, and reliability of your system, as well as for fostering customer trust and safeguarding your organization's reputation. As a growing number of organizations use AI tools and solutions for decision-making, operations, and sensitive customer interactions, a strong AI security strategy is crucial to ensure these tools remain trustworthy and are securely protected against external threats or internal vulnerabilities.AI Model Risks
LLMs span the spectrum from being considered clever toys to powerful tools. The reality, however, is that they are the latest addition to an organization’s critical information infrastructure and must be treated as a vulnerable threat surface. Much of the early discussion of LLM risks centered on identifying and solving for threats based on human behavior, such as data loss and inappropriate use. The more insidious threats, however, are stealth attacks to the models and/or their datasets. These attacks are not only growing in terms of the damage they can inflict, but in scope, scale, and nuance, and are becoming increasingly difficult to detect. Here, we outline three types of attacks that pose the most significant threats to enterprises deploying LLMs and other generative AI (GenAI) models.Jailbreaks/Prompt Injection
Jailbreak or prompt injection techniques attempt to ‘trick’ an LLM into providing information identified as dangerous, illegal, immoral, or unethical, or otherwise antithetical to standard social norms. Approximately two hours after ChatGPT-4 was released in March 2023, successful jailbreak attempts directed the system to provide instructions for an alarming collection of antisocial activities. Since then, numerous threat actors have developed carefully worded, highly detailed prompts – including role play, predictive text, reverse psychology, and other techniques – to get LLMs to bypass internal content filters and controls that regulate their responses. The danger of a successful jailbreak attack on a GenAI system, such as an LLM, is that it breaches the safeguards that prevent the model from executing bad commands, such as instructions to ignore protective measures or take destructive actions. Once that boundary between acceptable and unacceptable use disappears, the model has no further safeguards in place to stop it from following the new instructions.Data Poisoning
A poisoning attack can have as its target the model, the model’s training data, or your organization’s unsecured data source. The goal of a poisoning attack is to skew the results or predictions your model produces; the outcome is that your organization relies on the flawed results and makes bad, potentially damaging decisions, disseminates faulty or incorrect information, or takes other ill-advised actions based on the model output.- Data Poisoning attacks target the model’s training dataset. The threat actor alters or manipulates the data or adds malicious, incorrect, biased, or otherwise inappropriate data that skews the model output.
- Model Poisoning attacks target the model itself. The threat actor alters the model or its parameters to ensure faulty output.
- Backdoor attacks require a two-step approach. The threat actor first manipulates the model’s dataset by adding malicious data to create a hidden vulnerability that does not affect the model in any way until it is triggered. Activating the vulnerability is the second step in this attack; it allows the hacker to cause damage to your organization on their own schedule.
Adversarial AI
Adversarial attacks occur after models have been deployed and are in use. These attacks vary in approach, but are all difficult to detect, and very dangerous.- Model Inversion attacks review a model's output to uncover sensitive information about the model itself or the dataset it was trained on, which can lead to privacy breaches.
- Membership Inference attacks involve the threat actor trying to deduce whether specific data points, like information about a particular individual, were part of the training dataset. If successful, these attacks cause a significant invasion of privacy.
- Model-Stealing attacks involve scrutinizing the output of a trained model to steal or copy its intellectual property with the goal of cloning the original model, typically for commercial gain.
- Watermarking attacks change the parameters of a trained model to embed a hidden pattern that can be used to ‘prove’ who owns the model. This can lead to significant financial loss, as well as loss of competitive advantage.
- Model Inference attacks review a model’s output to find sensitive info about the training data or the model's parameters, which can lead to privacy breaches.