SECTION 11
Professional Methodology
A structured approach to AI red teaming ensuring comprehensive coverage, reproducible results, and actionable findings.
11.1Engagement Lifecycle
1. Reconnaissance
Model identification, feature enumeration, tool discovery, integration mapping, attack surface analysis.
2. Vulnerability ID
Systematic testing across all categories: injection, jailbreaks, data extraction, tool abuse, RAG poisoning.
3. Exploitation
Attack chain development, impact maximization, bypass refinement, evidence collection.
4. Reporting
Finding documentation, risk scoring, remediation guidance, executive summary.
11.2Reconnaissance Phase
recon_checklist.txt
/* RECONNAISSANCE CHECKLIST */ □ MODEL IDENTIFICATION □ Identify provider (OpenAI, Anthropic, Google, etc.) □ Determine model version and family □ Test context window size limits □ Probe for fine-tuning indicators □ Check for custom system prompts □ FEATURE ENUMERATION □ List all available tools/functions □ Map data sources (RAG, APIs, databases) □ Document input/output formats □ Identify file upload capabilities □ Check for MCP server integrations □ SYSTEM PROMPT ANALYSIS □ Attempt direct extraction techniques □ Probe for role definitions □ Identify restriction keywords □ Map allowed vs blocked behaviors □ Document response patterns
11.3Exploitation Phase
testing_matrix.txt
□ PROMPT INJECTION (Section 02) □ Direct instruction override □ Delimiter/tag escape attempts □ Context termination attacks □ Indirect injection via RAG/tools □ JAILBREAKING (Section 03) □ Persona-based attacks (DAN, etc.) □ Roleplay and fiction exploits □ Multi-turn escalation □ Encoding-based bypasses □ DATA EXTRACTION (Section 04) □ System prompt extraction □ Training data probing □ PII discovery attempts □ Exfiltration channel testing □ AGENT/TOOL TESTING (Section 05) □ File system access attempts □ Code execution vectors □ Network requests (SSRF) □ MCP server exploitation □ RAG ATTACKS (Section 06) □ Document poisoning □ Retrieval manipulation □ Context overflow □ MULTIMODAL (Section 07) □ Image-based injection □ Typography attacks □ Audio/video vectors □ DEFENSE EVASION (Section 09) □ Encoding bypass techniques □ Obfuscation methods □ Filter fingerprinting
11.4Reporting & Risk Scoring
| Severity | CVSS | Criteria | Examples |
|---|---|---|---|
| CRITICAL | 9.0-10.0 | RCE, full system compromise, mass user impact | Tool-based code execution, credential theft via MCP |
| HIGH | 7.0-8.9 | Significant data exfil, privilege escalation | Full system prompt leak, PII extraction, SSRF |
| MEDIUM | 4.0-6.9 | Policy bypass, limited data exposure | Jailbreaks, partial prompt leaks, filter bypasses |
| LOW | 0.1-3.9 | Minor policy violations, edge cases | Inconsistent refusals, minor info disclosure |
Report Structure
Executive Summary: High-level findings for leadership. Technical Details: Full reproduction steps, payloads, evidence. Risk Analysis: Business impact and likelihood. Remediation: Specific, actionable fixes prioritized by risk.