rotem israeli's picture

rotem israeli

irotem98

·

https://rotem154154.github.io

rotem154154

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago

Lightricks/LTX-Video-2B-0.9.6-Distilled-04-25

upvoted a paper 4 days ago

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

upvoted a paper 5 days ago

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

View all activity

Organizations

None yet

irotem98's activity

liked a model 2 days ago

Lightricks/LTX-Video-2B-0.9.6-Distilled-04-25

Updated 4 days ago • 1

upvoted a paper 4 days ago

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published 6 days ago • 45

upvoted a paper 5 days ago

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published 5 days ago • 21

upvoted 9 papers 8 days ago

FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation

Paper • 2504.07405 • Published 13 days ago • 11

PixelFlow: Pixel-Space Generative Models with Flow

Paper • 2504.07963 • Published 12 days ago • 18

ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

Paper • 2504.08591 • Published 11 days ago • 18

MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft

Paper • 2504.08388 • Published 11 days ago • 39

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Paper • 2504.08736 • Published 11 days ago • 47

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published 11 days ago • 120

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Paper • 2504.09641 • Published 9 days ago • 15

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 8 days ago • 239

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published 8 days ago • 38

upvoted 2 papers 11 days ago

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Paper • 2504.07960 • Published 12 days ago • 45

Kimi-VL Technical Report

Paper • 2504.07491 • Published 12 days ago • 118

upvoted 4 papers 12 days ago

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Paper • 2504.04010 • Published 18 days ago • 10

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Paper • 2504.04842 • Published 15 days ago • 32

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published 14 days ago • 73

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 130

upvoted a paper 14 days ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published 15 days ago • 96

liked a model 17 days ago

Wan-AI/Wan2.1-T2V-1.3B-Diffusers

Text-to-Video • Updated 19 days ago • 32.4k • 34