1 29 6

Shijie Geng

makitanikaze

AI & ML interests

None yet

Recent Activity

new activity 7 days ago

baichuan-inc/Baichuan-M1-14B-Base:ImportError: cannot import name '_flash_supports_window_size' from 'transformers.modeling_flash_attention_utils'

upvoted a paper about 1 month ago

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

upvoted a paper about 1 month ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

View all activity

Organizations

None yet

makitanikaze's activity

upvoted 5 papers about 1 month ago

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 30

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12 • 71

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 122

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 385

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 113

upvoted a paper about 2 months ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 78

upvoted 14 papers 2 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 155

An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging

Paper • 2502.09056 • Published Feb 13 • 32

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 48

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Paper • 2502.09560 • Published Feb 13 • 36

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published Feb 13 • 36

Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14 • 35

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published Feb 14 • 56

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 120

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5 • 61

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published Feb 5 • 44