Test name: Prompt Injection
Prompt injections happen when filters are evaded or the Language Model (LLM) is manipulated using meticulously crafted prompts. These prompts are strategically designed to make the model ignore previous instructions or carry out unintended actions. Exploiting these vulnerabilities can lead to unintended consequences, including data exposure, unauthorized access, or security breaches.
Data Leakage, Legal Issues, Token leaks.
The issue can be found in the server response.
- Privilege Control: Keep the LLM's privileges at the minimum necessary for its intended use. Make sure the LLM can't change a user's settings without clear permission.
- Enhanced Input Validation: Use strong validation and cleaning methods for input. This will help sift out potentially harmful prompts from sources that aren't trustworthy.
- Segregation and Control of External Content Interaction: Separate content that's not trusted from user prompts. Monitor how the LLM interacts with outside content, especially plugins that might trigger irreversible actions or expose personal data.
- Manage Trust: Define clear limits on who the LLM trusts – this includes external sources and add-on features. Treat the LLM as if it's not completely trustworthy, and keep ultimate control with the user for all important decisions.