越狱攻击论文 - 专知

会员服务 ·

越狱攻击

Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs

Arxiv

0+阅读 · 4月14日

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

Arxiv

0+阅读 · 6月4日

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

Arxiv

0+阅读 · 6月4日

Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Arxiv

0+阅读 · 5月19日

Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache

Arxiv

0+阅读 · 5月12日

Jailbreaking Generative AI: Multivector Phishing Threats and Transformer based Defenses

Arxiv

0+阅读 · 4月1日

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Arxiv

0+阅读 · 4月30日

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Arxiv

0+阅读 · 3月20日

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Arxiv

0+阅读 · 4月7日

Exclusive Unlearning

Arxiv

0+阅读 · 4月7日

ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs

Arxiv

0+阅读 · 4月20日

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Arxiv

0+阅读 · 3月14日

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

Arxiv

0+阅读 · 2月20日

JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models

Arxiv

0+阅读 · 2月28日

Untargeted Jailbreak Attack

Arxiv

0+阅读 · 3月2日

参考链接

微信扫码咨询专知VIP会员