DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Paper • 2502.05163 • Published Feb 7 • 22
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models Paper • 2502.15799 • Published Feb 18 • 7
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement Paper • 2502.16776 • Published Feb 24 • 6
LettuceDetect: A Hallucination Detection Framework for RAG Applications Paper • 2502.17125 • Published Feb 24 • 11
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks Paper • 2504.01308 • Published 23 days ago • 13
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models Paper • 2504.10430 • Published 10 days ago • 4
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits Paper • 2504.03767 • Published 22 days ago • 3
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts Paper • 2504.12782 • Published 8 days ago • 4
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published 9 days ago • 29
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Paper • 2504.15585 • Published 3 days ago • 9