BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models Paper • 2401.12242 • Published Jan 20, 2024 • 1
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17, 2024 • 52
Evaluation of OpenAI o1: Opportunities and Challenges of AGI Paper • 2409.18486 • Published Sep 27, 2024
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Paper • 2502.12025 • Published Feb 17 • 1
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models Paper • 2503.14827 • Published Mar 19
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs Paper • 2402.11753 • Published Feb 19, 2024 • 6
UMD: Unsupervised Model Detection for X2X Backdoor Attacks Paper • 2305.18651 • Published May 29, 2023 • 1
UMD: Unsupervised Model Detection for X2X Backdoor Attacks Paper • 2305.18651 • Published May 29, 2023 • 1