30 444 21

Fangyuan Yu PRO

Ksgk-fy

fangyuan-ksgk

AI & ML interests

AGI

Recent Activity

updated a collection about 1 hour ago

Representation & Optimization

upvoted a paper about 1 hour ago

Representation Learning with Contrastive Predictive Coding

updated a collection about 3 hours ago

Representation & Optimization

View all activity

Organizations

Ksgk-fy's activity

upvoted a paper about 1 hour ago

Representation Learning with Contrastive Predictive Coding

Paper • 1807.03748 • Published Jul 10, 2018 • 1

upvoted a paper about 3 hours ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 4 days ago • 74

upvoted a paper about 6 hours ago

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Paper • 2403.19647 • Published Mar 28, 2024 • 4

upvoted a paper 6 days ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published 15 days ago • 96

upvoted 2 papers 7 days ago

SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself

Paper • 2405.17052 • Published May 27, 2024 • 2

Learning-Order Autoregressive Models with Application to Molecular Graph Generation

Paper • 2503.05979 • Published Mar 7 • 2

upvoted a paper 8 days ago

Language Models Are Implicitly Continuous

Paper • 2504.03933 • Published 17 days ago • 2

upvoted a paper 9 days ago

Gradient Surgery for Multi-Task Learning

Paper • 2001.06782 • Published Jan 19, 2020 • 1

upvoted 2 papers 11 days ago

Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure

Paper • 2504.01928 • Published 20 days ago • 1

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Paper • 2503.01840 • Published Mar 3 • 5

upvoted 2 papers 12 days ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published 14 days ago • 102

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published 14 days ago • 147

upvoted a paper 19 days ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

Paper • 2410.17897 • Published Oct 23, 2024 • 9

upvoted 3 papers 20 days ago

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

Paper • 2412.05496 • Published Dec 7, 2024 • 1

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Paper • 2504.00906 • Published 21 days ago • 21

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published 21 days ago • 1

upvoted a collection 20 days ago

Representation & Optimization

Collection

Understanding about representation sheds light on optimization • 16 items • Updated about 1 hour ago • 1

upvoted 2 papers 20 days ago

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1

Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published 21 days ago • 1

upvoted a paper 24 days ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1