safety - a zzfive Collection

zzfive 's Collections

VLA

dLLM

RAG

ssm

safety

inference optimization

RL+reason model

medical

3d

image

LLMs

video

agent

cv

audio

robot

safety

updated 15 days ago

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

Paper • 2502.05163 • Published Feb 7 • 22
CRANE: Reasoning with constrained LLM generation

Paper • 2502.09061 • Published Feb 13 • 21
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models

Paper • 2502.15799 • Published Feb 18 • 7
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Paper • 2502.16776 • Published Feb 24 • 6
LettuceDetect: A Hallucination Detection Framework for RAG Applications

Paper • 2502.17125 • Published Feb 24 • 11
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6 • 21
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Paper • 2504.01308 • Published Apr 2 • 15
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Paper • 2504.10430 • Published Apr 14 • 5
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits

Paper • 2504.03767 • Published Apr 2 • 4
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts

Paper • 2504.12782 • Published Apr 17 • 4
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Paper • 2504.13203 • Published Apr 15 • 34
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

Paper • 2504.15585 • Published Apr 22 • 13
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Paper • 2505.01456 • Published May 1 • 2
Teaching Models to Understand (but not Generate) High-risk Data

Paper • 2505.03052 • Published May 5 • 6
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

Paper • 2505.14633 • Published May 20 • 3
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

Paper • 2505.15404 • Published May 21 • 13
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Paper • 2505.15656 • Published May 21 • 14
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Paper • 2505.16186 • Published May 22 • 7
Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach

Paper • 2505.18882 • Published May 24 • 14
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation

Paper • 2505.21784 • Published May 27 • 18
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

Paper • 2506.14866 • Published Jun 17 • 6
Automating Steering for Safe Multimodal Large Language Models

Paper • 2507.13255 • Published Jul 17 • 3
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

Paper • 2507.11097 • Published Jul 15 • 63
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Paper • 2507.16534 • Published Jul 22 • 6
Personalized Safety Alignment for Text-to-Image Diffusion Models

Paper • 2508.01151 • Published 21 days ago • 8
Data and AI governance: Promoting equity, ethics, and fairness in large language models

Paper • 2508.03970 • Published 17 days ago • 1