--- license: mit datasets: - wikimedia/wikipedia language: - en pipeline_tag: text-generation library_name: transformers tags: - pytorch - Thinking - CustomModel --- # Latent Recurrent Depth Language Model ## Overview The Latent Recurrent Depth Language Model (LRD-LM) is an experimental text-generation architecture designed to capture deeper contextual information through iterative, latent processing. Instead of generating verbose chain-of-thought sequences, LRD-LM refines its internal state over multiple recurrent iterations to improve text generation quality while keeping the parameter count modest. ## Architecture The model is built around three key components: - **Prelude Block:** This block handles the initial processing by embedding input tokens and applying self-attention with positional encodings. - **Recurrent Block:** A core, weight-shared block that iteratively refines a latent state. By repeatedly processing the prelude output along with its own evolving state, the model effectively “thinks” over the input without outputting intermediate tokens. - **Coda Block:** The final block decodes the refined latent state into output token probabilities. ## Applications & Limitations **Intended Uses:** - **Text Generation:** Generate creative text, dialogue, code, or other natural language content. - **Research:** Serve as a testbed for exploring novel architectures and techniques in language modeling. **Limitations:** - **Data Constraints:** Trained on a small subset (first 1000 samples) of the Wikitext-2-raw-v1 dataset, which may limit its performance compared to models trained on larger corpora. - **Performance:** While it demonstrates the potential of latent recurrent depth, its overall performance is experimental and may not match state-of-the-art models. - **Computational Overhead:** The iterative processing introduces extra computation. - **Bias:** As with all language models, generated outputs may reflect biases present in the training data. ## Training Details The model was fine-tuned on a subset of the Wikitext-2-raw-v1 dataset (first 1000 samples) using the AdamW optimizer and a cosine annealing learning rate scheduler. The training configuration and hyperparameters are provided in the accompanying code, and adjustments may be needed for improved performance. ## Usage The model can be used for text generation via its integrated `generate()` method, which allows you to control parameters such as the maximum sequence length, number of recurrent iterations, temperature, and top‑k filtering. ### Example: Direct Inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel # Load the model and tokenizer from the hub model = AutoModelForCausalLM.from_pretrained("codewithdark/latent-recurrent-depth-lm") tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm") prompt = "In the realm of language modeling" input_ids = tokenizer(prompt, return_tensors='pt').input_ids # Generate logits using a specified number of recurrent iterations logits = model(input_ids, num_iterations=3) # Sample from logits to produce generated text import torch probs = torch.softmax(logits[:, -1, :], dim=-1) next_token = torch.multinomial(probs, num_samples=1) generated_ids = torch.cat([input_ids, next_token], dim=1) generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) clean_text = generated_text.replace('Ġ','') print(generated_text) ``` ### Alternative: Using the `generate()` Method ```python from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm") model = AutoModel.from_pretrained("codewithdark/latent-recurrent-depth-lm", trust_remote_code=True) prompt = "In the realm of language modeling" input_ids = tokenizer(prompt, return_tensors="pt").input_ids generated_ids = model.generate(input_ids, max_length=50, num_iterations=10, temperature=0.5, top_k=50) generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) clean_text = generated_text.replace('Ġ','') print(clean_text) ``` ## Ethical Considerations This model is intended for research and experimental use. Users must ensure ethical application and carefully consider potential biases and misuse when deploying or further developing this technology. ## License This project is licensed under the MIT License.