Tang's picture

356

Tang

tommysally

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

upvoted a paper about 1 month ago

Transformers without Normalization

upvoted a paper about 1 month ago

CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance

View all activity

Organizations

None yet

tommysally's activity

upvoted 20 papers about 1 month ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published Mar 12 • 36

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 157

CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance

Paper • 2503.10391 • Published Mar 13 • 10

Long Context Tuning for Video Generation

Paper • 2503.10589 • Published Mar 13 • 14

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Paper • 2503.10437 • Published Mar 13 • 31

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Paper • 2503.09642 • Published Mar 12 • 17

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Paper • 2503.10460 • Published Mar 13 • 27

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 16

Quantization for OpenAI's Whisper Models: A Comparative Analysis

Paper • 2503.09905 • Published Mar 12 • 6

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Paper • 2503.10636 • Published Mar 13 • 3

Autoregressive Image Generation with Randomized Parallel Decoding

Paper • 2503.10568 • Published Mar 13 • 8

Shifting Long-Context LLMs Research from Input to Output

Paper • 2503.04723 • Published Mar 6 • 20

Distilling Diversity and Control in Diffusion Models

Paper • 2503.10637 • Published Mar 13 • 14

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 48

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published Mar 13 • 22

New Trends for Modern Machine Translation with Large Reasoning Models

Paper • 2503.10351 • Published Mar 13 • 22

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Paper • 2503.10613 • Published Mar 13 • 77

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

Paper • 2503.10480 • Published Mar 13 • 50

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Paper • 2503.09669 • Published Mar 12 • 35