UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Paper • 2503.14941 • Published Mar 19 • 6
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Paper • 2503.14941 • Published Mar 19 • 6
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Paper • 2503.14941 • Published Mar 19 • 6 • 2
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Paper • 2310.03128 • Published Oct 4, 2023 • 1
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark Paper • 2402.04788 • Published Feb 7, 2024
The Best of Both Worlds: Toward an Honest and Helpful Large Language Model Paper • 2406.00380 • Published Jun 1, 2024
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents Paper • 2406.10819 • Published Jun 16, 2024 • 1
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models Paper • 2406.18966 • Published Jun 27, 2024
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models Paper • 2306.11507 • Published Jun 20, 2023
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected? Paper • 2401.05952 • Published Jan 11, 2024
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge Paper • 2410.02736 • Published Oct 3, 2024
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46