-
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 109 -
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57
Collections
Discover the best community collections!
Collections including paper arxiv:2506.14761
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 42 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
Paper • 2401.05811 • Published • 8 -
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Paper • 2409.20059 • Published • 17 -
Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation
Paper • 2302.14220 • Published
-
Let's Predict Sentence by Sentence
Paper • 2505.22202 • Published • 19 -
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Paper • 2506.14761 • Published • 17 -
TokAlign: Efficient Vocabulary Adaptation via Token Alignment
Paper • 2506.03523 • Published -
zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression
Paper • 2506.01084 • Published • 7
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70
-
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 109 -
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57
-
Let's Predict Sentence by Sentence
Paper • 2505.22202 • Published • 19 -
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Paper • 2506.14761 • Published • 17 -
TokAlign: Efficient Vocabulary Adaptation via Token Alignment
Paper • 2506.03523 • Published -
zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression
Paper • 2506.01084 • Published • 7
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 42 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
Paper • 2401.05811 • Published • 8 -
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Paper • 2409.20059 • Published • 17 -
Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation
Paper • 2302.14220 • Published