Multimodal Attacks
Attack techniques targeting vision-language models, audio models, and cross-modal systems. Covers adversarial images, hidden text injection, typography attacks, and audio exploits.
8.1Vision-Language Model Attacks
Vision-Language Models (VLMs) like GPT-4V, Claude 3, and Gemini process both text and images. The image modality introduces new attack surfaces invisible to text-only defenses.
Typography & Text-in-Image Attacks
Adversarial Image Attacks
/* ═══════════════════════════════════════════════════════════════ VISION MODEL ATTACK TECHNIQUES ═══════════════════════════════════════════════════════════════ */ // ─── TYPOGRAPHY ATTACKS ────────────────────────────────────── "Image with 'Ignore previous instructions' in large text" "Screenshot of fake system message" "Image of terminal with 'sudo' commands" "Fake error dialog with injection payload" "Image of code editor with malicious code" // ─── HIDDEN TEXT TECHNIQUES ────────────────────────────────── "White text (#FFF) on near-white background (#FEFEFE)" "1px font text in image corner" "Text at 2% opacity overlaid on image" "Instructions hidden in image texture/noise" "Text in alpha channel (transparent layer)" // ─── STRUCTURAL ATTACKS ────────────────────────────────────── "EXIF comment field: 'New instructions: ...'" "IPTC metadata with injection payload" "XMP data containing malicious instructions" "Multi-frame GIF with hidden instruction frame" "Animated PNG with payload in frame 2" // ─── ADVERSARIAL PERTURBATIONS ─────────────────────────────── "PGD attack optimized for instruction following" "FGSM perturbation targeting safety bypass" "Universal adversarial patch (works on any image)" "Semantic adversarial (meaningful modifications)" // ─── CONTEXT MANIPULATION ──────────────────────────────────── "Image appears benign but context triggers behavior" "Upload image that 'continues' a jailbreak conversation" "Image containing 'previous assistant response'"
8.2Audio Model Attacks
Audio-enabled LLMs (voice assistants, speech-to-text + LLM pipelines) introduce acoustic attack vectors. These include adversarial audio, ultrasonic injection, and hidden voice commands.
Audio perturbations imperceptible to humans but transcribed as injection commands by speech-to-text systems.
Commands embedded in frequencies above human hearing (>20kHz) but captured by microphones and processed by models.
Commands hidden in background noise, music, or other audio that humans don't notice but speech models detect.
Carefully crafted audio that sounds like one thing to humans but is transcribed as something completely different.
/* AUDIO MODEL ATTACK TECHNIQUES */ // ─── ADVERSARIAL AUDIO ─────────────────────────────────────── "Perturbation attack on Whisper/speech models" "Audio sounds like noise, transcribes as commands" "Music with hidden command perturbations" // ─── ULTRASONIC ATTACKS ────────────────────────────────────── "21kHz carrier with amplitude-modulated commands" "Dolphin attack: ultrasonic voice commands" "Inaudible audio injection via speakers" // ─── HIDDEN COMMANDS ───────────────────────────────────────── "Commands in background noise of videos" "Injection in hold music/elevator music" "Podcast with hidden commands in intro/outro" // ─── TRANSCRIPTION ATTACKS ─────────────────────────────────── "Phonetically similar but semantically different" "Exploit homophones: 'their' vs 'there' + context" "Speed/pitch manipulation to bypass filters"
8.3Cross-Modal Exploits
Cross-modal attacks exploit the interaction between different modalities. Information in one modality (image) can influence processing of another (text), creating complex attack chains.
Image → Text Influence
Instructions in images can override or modify how the model interprets subsequent text prompts.
Audio → Text Injection
Transcribed audio content can inject into text context, enabling attacks via voice messages or audio files.
Video Frame Injection
Single frames in videos can contain injection payloads that are processed but not noticed by human viewers.
Document + Image Combo
PDFs with embedded images create dual attack vectors - both document text and image content can carry payloads.