These docs are for v1.1. Click to read the latest docs for v1.2.

Prompt Injection

Severity: Medium
Test name: Prompt Injection

Prompt injections occur when filters are bypassed or the LLM is manipulated using precisely constructed prompts. These prompts are designed in such a way that they cause the model to disregard earlier instructions or execute actions that weren't intended. Exploiting these vulnerabilities can result in unintended outcomes, such as exposing data, gaining unauthorized entry, or causing security infringements.


Data Leakage, Legal Issues, Token leaks.


The issue can be found in the Server Response.

Remedy suggestions
  • Privilege Control: Keep the LLM's privileges at the minimum necessary for its intended use. Make sure the LLM can't change a user's settings without clear permission.
  • Enhanced Input Validation: Use strong validation and cleaning methods for input. This will help sift out potentially harmful prompts from sources that aren't trustworthy.
  • Segregation and Control of External Content Interaction: Separate content that's not trusted from user prompts. Monitor how the LLM interacts with outside content, especially plugins that might trigger irreversible actions or expose personal data.
  • Manage Trust: Define clear limits on who the LLM trusts – this includes external sources and add-on features. Treat the LLM as if it's not completely trustworthy, and keep ultimate control with the user for all important decisions.
  • CWE-20