arxiv:2507.00453

Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling

Published on Jul 1

Authors:

Abstract

A Transformer architecture with global attention, chunked local attention, and gated FIFO memory efficiently handles long-range dependencies in language modeling.

AI-generated summary

We present a Transformer architecture for long-context language modeling that combines global attention with two biologically inspired components: chunked local attention and a gated FIFO memory mechanism. This unified attention block allows the model to efficiently handle both short-range and long-range dependencies without increasing attention cost quadratically. The memory module persistently stores past token representations using a gated update mechanism inspired by recurrent networks. Rotary positional encoding is applied per attention head to enable directionally disentangled, scale-invariant positional signals. The architecture is implemented entirely from scratch in PyTorch, with no reliance on high-level libraries, enabling transparent and modular experimentation. Our model offers a lightweight and extensible design for tasks such as dialogue modeling, code completion, and document understanding.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.00453 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.00453 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.00453 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.