nm-testing
/

EAGLE3-LLaMA3.3-Instruct-70B-speculators

Model card Files Files and versions

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Eagle-3 Speculator for Llama-3.3-70B-Instruct

This is an Eagle-3 speculator checkpoint converted to the speculators format.

Model Details

Base Model: meta-llama/Llama-3.3-70B-Instruct
Speculator Type: Eagle-3
Draft Vocabulary Size: 32,000
Target Vocabulary Size: 128,256
Architecture: Single-layer transformer with vocabulary mapping
Target Model Hidden Size: 8,192
Draft Model Hidden Size: 6,144

Key Features

Vocabulary Mapping: Maps between draft (32K) and target (128K) vocabularies
Custom Attention: Modified attention layer accepting 2×hidden_size input
Fusion Layer: Processes 3 verifier layers from target model (3×8192 → 6144)
Optimized for 70B Models: Specifically configured for Llama-3.3-70B architecture

Usage

from speculators.models.eagle3 import Eagle3Speculator, Eagle3SpeculatorConfig
from transformers import AutoModelForCausalLM

# Load verifier model
verifier = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")

# Load Eagle-3 speculator
speculator = Eagle3Speculator.from_pretrained(
    "nm-testing/EAGLE3-LLaMA3.3-Instruct-70B-speculators",
    verifier=verifier
)

Configuration

This model uses the Eagle-3 architecture with:

Hidden size: 6,144 (draft model)
Target hidden size: 8,192 (70B Llama model)
Attention heads: 48
Key-value heads: 8
Intermediate size: 16,384
RMS norm epsilon: 1e-05

Original Model

Converted from: yuhuili/EAGLE3-LLaMA3.3-Instruct-70B

Citation

Based on the Eagle-3 paper: https://arxiv.org/abs/2503.01840

License

Please refer to the base Llama-3.3 model license.

Downloads last month: 3

Safetensors

Model size

1.58B params

Tensor type

I64

·

F32

·

BOOL

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support