WonJae Roh's picture

28

WonJae Roh

snuro

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

upvoted a paper 4 days ago

Kimi-VL Technical Report

upvoted a paper 4 days ago

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

View all activity

Organizations

None yet

snuro's activity

upvoted 8 papers 4 days ago

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 20 days ago • 80

Kimi-VL Technical Report

Paper • 2504.07491 • Published 11 days ago • 115

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Paper • 2504.08736 • Published 10 days ago • 47

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published 10 days ago • 119

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published 14 days ago • 117

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 7 days ago • 232

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published 10 days ago • 52

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published 7 days ago • 81

upvoted a paper 21 days ago

Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction

Paper • 2503.16194 • Published Mar 20 • 8

upvoted 5 papers about 1 month ago

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14 • 18

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14 • 134

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

Paper • 2503.07677 • Published Mar 10 • 82

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 97

"Principal Components" Enable A New Language of Images

Paper • 2503.08685 • Published Mar 11 • 12

upvoted 6 papers about 2 months ago

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing

Paper • 2502.17258 • Published Feb 24 • 78

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 76

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published Mar 2 • 57

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 84

GHOST 2.0: generative high-fidelity one shot transfer of heads

Paper • 2502.18417 • Published Feb 25 • 66

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 73