DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published 3 days ago • 72
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 17 days ago • 102
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization Paper • 2508.10395 • Published 9 days ago • 37
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 16 days ago • 155
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published 16 days ago • 62
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension Paper • 2508.01959 • Published 20 days ago • 56
view article Article Unsupervised Model Improvement via Internal Coherence Maximization: Outperforming Human-Supervised Methods Through Self-Elicitation By codelion • 20 days ago • 6
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Paper • 2507.14111 • Published Jul 18 • 22
view article Article Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ By Wauplin and 2 others • 30 days ago • 79
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 118
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22 • 62
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers Paper • 2507.08422 • Published Jul 11 • 35
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 41
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9 • 45
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 85
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published Jul 8 • 115