meta-llama/Llama-Guard-3-1B · Update README.md

🚨 Suggested Correction to Model Usage Example

In the current example, it is recommended to remove pad_token_id=0 from the generate() call, as it does not provide any functional value in this context.

🔍 Why This Matters

The tokenizer in the example does not have a default padding token set.
If you provide multiple inputs as a batch, the tokenizer cannot automatically pad them to the same length unless:
- padding=True is specified, and
- a valid pad_token_id is configured.
Without both, calling generate() may result in errors or unexpected behavior.
Setting pad_token_id=0 without configuring the tokenizer may silently introduce incorrect behavior, especially if 0 corresponds to a meaningful token (e.g., <unk> or <eos>).

✅ Recommendation

Update the example using one of the following options:

Option 1: If batching is not intended

Remove pad_token_id=0 from the generate() call:

output = model.generate(
    input_ids,
    max_new_tokens=20
)

Option 2: If batching is intended

Set tokenizer.pad_token to the model's pad token. For example, the tokenizer.json contains:

{
  "id": 128004,
  "content": "<|finetune_right_pad_id|>",
  "single_word": false,
  "lstrip": false,
  "rstrip": false,
  "normalized": false,
  "special": true
}

You should set:

tokenizer.pad_token = "<|finetune_right_pad_id|>"

Example: Using `apply_chat_template` and `generate` with batching

# Prepare a batch of chat messages
messages_batch = [
    [{"role": "user", "content": "Hello!"}],
    [{"role": "user", "content": "How are you?"}]
]

# Apply chat template and tokenize with padding
input_ids = tokenizer.apply_chat_template(
    messages_batch,
    pad_token_id=tokenizer.pad_token,
    padding=True,
    return_tensors="pt"
)

# Generate outputs with correct pad_token_id
outputs = model.generate(
    input_ids,
    max_new_tokens=20,
    pad_token_id=tokenizer.pad_token_id
)

Update README.md