We Support
Think you can outsmart AI? Announcing ‘Behind The Mask’ – Our all-new cybercrime role-playing game | Play Now
Think you can outsmart AI? Announcing ‘Behind The Mask’ – Our all-new cybercrime role-playing game | Play Now
CalypsoAI is the #1 solution to securing your system from data infiltration via large language models, thus preventing costly data breaches, and protecting your organization’s intellectual property.
The risks an organization faces from internal threat actors using “jailbreak” or prompt injection techniques to “trick” an LLM into providing information your organization has identified as contrary to your values or practices can include unauthorized access to sensitive or confidential data, among other scenarios. CalypsoAI is a proven solution for blocking prompt-driven techniques, such as role-playing, reverse psychology, virtual environment rule-setting, and hypothetical engagements, that attempt to override standard or admin-established boundaries for malign purposes.
An employee wants to bypass LLM rules that prohibit highly inflammatory messages from being sent in a prompt. By creating a virtual environment in which existing rules do not apply, the user is able to get the information past the filters, which releases the information into the LLM’s body of knowledge, and into the chat history it maintains on that user, and the organization.
In direct violation of organization rules, a user has “tricked” the LLM into allowing them to send controversial content that violates social norms and company values, sharing it with an unauthorized third party. The information is, therefore, at risk of further dissemination due to leaks or hacks to the third party, as well as at risk of becoming part of the dataset used to train/retrain subsequent iterations of the LLM. The information could also be included in the LLM’s knowledge base and, therefore, be accessible to all users, damaging the organization’s reputation by association.
CalypsoAI scans prompts for patterns and categories of techniques, such as role-playing, reverse psychology, virtual environment rule-setting, and hypothetical engagements, that attempt to override standard or admin-established boundaries for malign purposes. All details of the interaction are recorded, providing full auditability and attribution.