Having a solid, fact-based understanding of where your organization is on the AI security preparedness spectrum is the first step toward creating a secure artificial intelligence (AI) system. The growing use of generative AI (GenAI) models, such as ChatGPT and others, is complicating the AI landscape in many companies and adding complexity to the dynamics of their overall security infrastructure.
These models are making decisions, providing services, and executing numerous tasks using natural language processing (NLP), computer vision (CV), and speech recognition across operations as diverse as finance, engineering, manufacturing, legal, human resources, and others. Securing these AI systems, which now frequently include multiple single-modality models (unimodal) and/or models of more than one modality (multimodal), such as text, images, code, audio, video, tabular and numeric data, and others, requires a comprehensive approach. A comprehensive security framework for such a system should include the following considerations:
Ensuring that data from different modalities is securely and adequately protected from unauthorized access, tampering, leakage, or interception while being collected, stored, transmitted, or distributed is key to managing risk. Implementing policy-based access controls (PBAC) that restrict model usage to authorized personnel expands system capacity for preventing abuse or misuse of intellectual property (IP) and other valuable organizational assets. Ensuring employees and other users are educated about the role they play in mitigating or enhancing risk is also important to ensure data does not “accidentally” leak outside the organization. Perhaps the most common source of unintended data leakage is content in a prompt to the model. An employee requesting the model’s help drafting an email that already includes detailed account information might not realize they have just sent that information out of the organization perimeter in violation of company policy and possibly governmental regulations.
Protecting the IP contained within the models themselves, such as proprietary data used to fine-tune models or train retrieval-augmented generation (RAG) models, becomes critical when deploying multiple large language models (LLMs) or multimodal models across an enterprise. Model encryption provides a strong safeguard against theft of IP relied on by the model. Automated change-tracking creates a clear record of model versioning and validation, which provides a clear audit trail if malicious code is detected and ensures that only trusted models are deployed into production.
The importance of controlling which employees or other users can interact with systems cannot be overstated. Strict PBAC mechanisms can be used to limit who can access the models, their data, and their output. Robust authentication protocols, such as multi-factor authentication (MFA) or biometrics, in combination with rigorous authorization mechanisms further ensure only authorized users and systems can interact with the models.
When properly implemented and maintained, strong firewalls and other network security measures can protect against unauthorized access to the model servers and infrastructure. Similarly, containers that are hardened, routinely inspected, and patched to address vulnerabilities provide another layer of security. Where APIs are used, ensure proper authentication and rate limiting processes are actively deployed to prevent abuse.
Model Explainability and Auditing
When a security incident occurs, the responding teams must look in all directions at the same time:
- Into the past to see what happened, as well as where, when, and how often
- At the real-time situation to identify the active vulnerabilities and shut them down
- Into the future to ensure they repair and reinforce the perimeter and mitigate the damage
Detailed, robust logs and audit trails allow instant historical and real-time review of both user behavior and model performance, providing unassailable transparency when identifying activities and determining attribution.
As models evolve, so do the capabilities of threat actors determined to infiltrate them. One way to thwart this nefarious goal is through routine evaluation and testing of model resilience against adversarial attacks, which can involve multiple modalities. Another is to scan prompts for manipulative language or techniques that are routinely used, for instance, in prompt injection or “jailbreak” attacks, in which the threat actor tries to override or circumvent security guardrails. These techniques include role-playing, world-building, and reverse psychology. Continuously monitoring model performance and behavior enables signs of adversarial attacks to be identified and contained before damage is done.
Protecting personal or otherwise sensitive information is only getting more important as time moves on. And not just at the governmental or regulatory level, although sweeping measures are certainly being drafted and passed in the UK, EU, U.S., and Singapore, for example, as well as at the state level in California. Many of these rules require affirmative consent from individuals. Organizations are coming under pressure to implement internal data privacy policies. Deploying data anonymization techniques, such as differential privacy, to protect sensitive or personal information in the training data can be helpful. Where federated learning is part of model training, privacy-preserving techniques should be used to ensure individual user data remains private.
Security Assessments and Testing
Regularly performed penetration testing can identify and address vulnerabilities, and comprehensive security assessments conducted on an established cadence can pinpoint weaknesses or other potential problems with system architecture, deployment, and configurations.
As with any other networked system, it is critical to ensure all security patches for all components of the AI infrastructure are kept current and that regular, proactive audits are conducted on the attack surface, including software frameworks, libraries, and all dependencies.
Incident Response Plan
Having a well-defined and routinely rehearsed Incident Response Plan in place can mean the difference between a data breach or other security incident being a problem and being a catastrophe. A comprehensive plan includes guidance for detecting, responding to, and recovering from expected and unexpected security incidents rapidly and effectively.
A set of detailed, thorough policies governing the use selection, deployment, use, and maintenance of GenAI models is only beneficial if the policies are shared, understood, and enforced. Customizable, automated security solutions can reinforce acceptable human behavior by ensuring prompts that include content that violates organizational, industry, or regulatory rules and standards are blocked and do not reach the model.
Security Training and Awareness
Personnel and stakeholders involved in model development and operation (DevOps) must be trained on security best practices and awareness so they can build in defenses against security breaches and social engineering attacks. All employees that use GenAI models must also be trained on their proper use, including relevant policies and regulations, and be educated about detecting misuse, abuse, or malicious attacks.
Deploying multiple models and models with multiple modalities greatly expands an organization’s attack surface, necessitating a proactive and multi-disciplinary approach that addresses the unique challenges of incorporating diverse data sources. A holistic security plan that includes both offensive and defensive mechanisms is key, and layering seamless automation and organization-specific controls into the mix provides additional protection.
While the measures needed will depend on the models deployed, the business use, and the regulatory environment in which the organization resides, CalypsoAI’s GenAI security solution, Moderator, provides a full suite of comprehensive, model-agnostic protections across the enterprise. Moderator is the only solution on the market that can provide a secure, resilient environment where other AI security defenses fall short. As a scalable and “weightless” layer in the security infrastructure, Moderator enables full observability across all models in use with no introduced latency issues.
Administrators can see who is doing what, how often, and on which models because Moderator records all details of each interaction. This feature provides administrators with both wide and deep user and system insights, including comparative and analytic data and full auditability and attribution around activity, content, and resource allocation. Automated, customizable scanners review every user prompt and every model response for source code, prompt injections, sensitive and personal data, toxicity, bias, vulnerabilities, and exploitable, malicious, or otherwise suspicious content; adverse content is blocked from leaving or entering the system.
All Moderator actions occur within your secure environment: Moderator does not harvest telemetry or any other data about your organization’s model engagements. Together, Moderator’s features provide a first-of-its-kind tool that can ensure the peace of mind decision-makers must have when deploying LLMs across the enterprise.