Adversarial Prompt Testing
Design tests to find prompt vulnerabilities and weaknesses
advanced
230 points
55 minutes
security
Instructions
Create an adversarial testing suite for a content moderation bot: Test categories: 1. Jailbreak attempts 2. Role confusion 3. Context overflow 4. Encoded instructions 5. Social engineering For each: - 3 attack examples - Expected safe response - How bot should defend - Detection methodology Provide: - Complete test suite - Threat model - Mitigation strategies - Security scorecard
Hints
AI-Powered Hints
Smart
Get personalized hints based on your current progress. Start with gentle hints and progress to more detailed ones.
Tags
security
adversarial
testing
Access Level
PRO
Upgrade to access this challenge