Adversarial Prompt Testing

Design tests to find prompt vulnerabilities and weaknesses

advanced
230 points
55 minutes
security

Instructions

Create an adversarial testing suite for a content moderation bot: Test categories: 1. Jailbreak attempts 2. Role confusion 3. Context overflow 4. Encoded instructions 5. Social engineering For each: - 3 attack examples - Expected safe response - How bot should defend - Detection methodology Provide: - Complete test suite - Threat model - Mitigation strategies - Security scorecard

Hints

AI-Powered Hints

Smart

Get personalized hints based on your current progress. Start with gentle hints and progress to more detailed ones.

Tags

security
adversarial
testing

Access Level

PRO

Upgrade to access this challenge