|
|
|
Utilities for Generation |
|
This page lists all the utility functions used by [~generation.GenerationMixin.generate], |
|
[~generation.GenerationMixin.greedy_search], |
|
[~generation.GenerationMixin.contrastive_search], |
|
[~generation.GenerationMixin.sample], |
|
[~generation.GenerationMixin.beam_search], |
|
[~generation.GenerationMixin.beam_sample], |
|
[~generation.GenerationMixin.group_beam_search], and |
|
[~generation.GenerationMixin.constrained_beam_search]. |
|
Most of those are only useful if you are studying the code of the generate methods in the library. |
|
Generate Outputs |
|
The output of [~generation.GenerationMixin.generate] is an instance of a subclass of |
|
[~utils.ModelOutput]. This output is a data structure containing all the information returned |
|
by [~generation.GenerationMixin.generate], but that can also be used as tuple or dictionary. |
|
Here's an example: |
|
thon |
|
from transformers import GPT2Tokenizer, GPT2LMHeadModel |
|
tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2") |
|
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2") |
|
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt") |
|
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True) |
|
|
|
The generation_output object is a [~generation.GenerateDecoderOnlyOutput], as we can |
|
see in the documentation of that class below, it means it has the following attributes: |
|
|
|
sequences: the generated sequences of tokens |
|
scores (optional): the prediction scores of the language modelling head, for each generation step |
|
hidden_states (optional): the hidden states of the model, for each generation step |
|
attentions (optional): the attention weights of the model, for each generation step |
|
|
|
Here we have the scores since we passed along output_scores=True, but we don't have hidden_states and |
|
attentions because we didn't pass output_hidden_states=True or output_attentions=True. |
|
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you |
|
will get None. Here for instance generation_output.scores are all the generated prediction scores of the |
|
language modeling head, and generation_output.attentions is None. |
|
When using our generation_output object as a tuple, it only keeps the attributes that don't have None values. |
|
Here, for instance, it has two elements, loss then logits, so |
|
python |
|
generation_output[:2] |
|
will return the tuple (generation_output.sequences, generation_output.scores) for instance. |
|
When using our generation_output object as a dictionary, it only keeps the attributes that don't have None |
|
values. Here, for instance, it has two keys that are sequences and scores. |
|
We document here all output types. |
|
PyTorch |
|
[[autodoc]] generation.GenerateDecoderOnlyOutput |
|
[[autodoc]] generation.GenerateEncoderDecoderOutput |
|
[[autodoc]] generation.GenerateBeamDecoderOnlyOutput |
|
[[autodoc]] generation.GenerateBeamEncoderDecoderOutput |
|
TensorFlow |
|
[[autodoc]] generation.TFGreedySearchEncoderDecoderOutput |
|
[[autodoc]] generation.TFGreedySearchDecoderOnlyOutput |
|
[[autodoc]] generation.TFSampleEncoderDecoderOutput |
|
[[autodoc]] generation.TFSampleDecoderOnlyOutput |
|
[[autodoc]] generation.TFBeamSearchEncoderDecoderOutput |
|
[[autodoc]] generation.TFBeamSearchDecoderOnlyOutput |
|
[[autodoc]] generation.TFBeamSampleEncoderDecoderOutput |
|
[[autodoc]] generation.TFBeamSampleDecoderOnlyOutput |
|
[[autodoc]] generation.TFContrastiveSearchEncoderDecoderOutput |
|
[[autodoc]] generation.TFContrastiveSearchDecoderOnlyOutput |
|
FLAX |
|
[[autodoc]] generation.FlaxSampleOutput |
|
[[autodoc]] generation.FlaxGreedySearchOutput |
|
[[autodoc]] generation.FlaxBeamSearchOutput |
|
LogitsProcessor |
|
A [LogitsProcessor] can be used to modify the prediction scores of a language model head for |
|
generation. |
|
PyTorch |
|
[[autodoc]] AlternatingCodebooksLogitsProcessor |
|
- call |
|
[[autodoc]] ClassifierFreeGuidanceLogitsProcessor |
|
- call |
|
[[autodoc]] EncoderNoRepeatNGramLogitsProcessor |
|
- call |
|
[[autodoc]] EncoderRepetitionPenaltyLogitsProcessor |
|
- call |
|
[[autodoc]] EpsilonLogitsWarper |
|
- call |
|
[[autodoc]] EtaLogitsWarper |
|
- call |
|
[[autodoc]] ExponentialDecayLengthPenalty |
|
- call |
|
[[autodoc]] ForcedBOSTokenLogitsProcessor |
|
- call |
|
[[autodoc]] ForcedEOSTokenLogitsProcessor |
|
- call |
|
[[autodoc]] ForceTokensLogitsProcessor |
|
- call |
|
[[autodoc]] HammingDiversityLogitsProcessor |
|
- call |
|
[[autodoc]] InfNanRemoveLogitsProcessor |
|
- call |
|
[[autodoc]] LogitNormalization |
|
- call |
|
[[autodoc]] LogitsProcessor |
|
- call |
|
[[autodoc]] LogitsProcessorList |
|
- call |
|
[[autodoc]] LogitsWarper |
|
- call |
|
[[autodoc]] MinLengthLogitsProcessor |
|
- call |
|
[[autodoc]] MinNewTokensLengthLogitsProcessor |
|
- call |
|
[[autodoc]] NoBadWordsLogitsProcessor |
|
- call |
|
[[autodoc]] NoRepeatNGramLogitsProcessor |
|
- call |
|
[[autodoc]] PrefixConstrainedLogitsProcessor |
|
- call |
|
[[autodoc]] RepetitionPenaltyLogitsProcessor |
|
- call |
|
[[autodoc]] SequenceBiasLogitsProcessor |
|
- call |
|
[[autodoc]] SuppressTokensAtBeginLogitsProcessor |
|
- call |
|
[[autodoc]] SuppressTokensLogitsProcessor |
|
- call |
|
[[autodoc]] TemperatureLogitsWarper |
|
- call |
|
[[autodoc]] TopKLogitsWarper |
|
- call |
|
[[autodoc]] TopPLogitsWarper |
|
- call |
|
[[autodoc]] TypicalLogitsWarper |
|
- call |
|
[[autodoc]] UnbatchedClassifierFreeGuidanceLogitsProcessor |
|
- call |
|
[[autodoc]] WhisperTimeStampLogitsProcessor |
|
- call |
|
TensorFlow |
|
[[autodoc]] TFForcedBOSTokenLogitsProcessor |
|
- call |
|
[[autodoc]] TFForcedEOSTokenLogitsProcessor |
|
- call |
|
[[autodoc]] TFForceTokensLogitsProcessor |
|
- call |
|
[[autodoc]] TFLogitsProcessor |
|
- call |
|
[[autodoc]] TFLogitsProcessorList |
|
- call |
|
[[autodoc]] TFLogitsWarper |
|
- call |
|
[[autodoc]] TFMinLengthLogitsProcessor |
|
- call |
|
[[autodoc]] TFNoBadWordsLogitsProcessor |
|
- call |
|
[[autodoc]] TFNoRepeatNGramLogitsProcessor |
|
- call |
|
[[autodoc]] TFRepetitionPenaltyLogitsProcessor |
|
- call |
|
[[autodoc]] TFSuppressTokensAtBeginLogitsProcessor |
|
- call |
|
[[autodoc]] TFSuppressTokensLogitsProcessor |
|
- call |
|
[[autodoc]] TFTemperatureLogitsWarper |
|
- call |
|
[[autodoc]] TFTopKLogitsWarper |
|
- call |
|
[[autodoc]] TFTopPLogitsWarper |
|
- call |
|
FLAX |
|
[[autodoc]] FlaxForcedBOSTokenLogitsProcessor |
|
- call |
|
[[autodoc]] FlaxForcedEOSTokenLogitsProcessor |
|
- call |
|
[[autodoc]] FlaxForceTokensLogitsProcessor |
|
- call |
|
[[autodoc]] FlaxLogitsProcessor |
|
- call |
|
[[autodoc]] FlaxLogitsProcessorList |
|
- call |
|
[[autodoc]] FlaxLogitsWarper |
|
- call |
|
[[autodoc]] FlaxMinLengthLogitsProcessor |
|
- call |
|
[[autodoc]] FlaxSuppressTokensAtBeginLogitsProcessor |
|
- call |
|
[[autodoc]] FlaxSuppressTokensLogitsProcessor |
|
- call |
|
[[autodoc]] FlaxTemperatureLogitsWarper |
|
- call |
|
[[autodoc]] FlaxTopKLogitsWarper |
|
- call |
|
[[autodoc]] FlaxTopPLogitsWarper |
|
- call |
|
[[autodoc]] FlaxWhisperTimeStampLogitsProcessor |
|
- call |
|
StoppingCriteria |
|
A [StoppingCriteria] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations. |
|
[[autodoc]] StoppingCriteria |
|
- call |
|
[[autodoc]] StoppingCriteriaList |
|
- call |
|
[[autodoc]] MaxLengthCriteria |
|
- call |
|
[[autodoc]] MaxTimeCriteria |
|
- call |
|
Constraints |
|
A [Constraint] can be used to force the generation to include specific tokens or sequences in the output. Please note that this is exclusively available to our PyTorch implementations. |
|
[[autodoc]] Constraint |
|
[[autodoc]] PhrasalConstraint |
|
[[autodoc]] DisjunctiveConstraint |
|
[[autodoc]] ConstraintListState |
|
BeamSearch |
|
[[autodoc]] BeamScorer |
|
- process |
|
- finalize |
|
[[autodoc]] BeamSearchScorer |
|
- process |
|
- finalize |
|
[[autodoc]] ConstrainedBeamSearchScorer |
|
- process |
|
- finalize |
|
Utilities |
|
[[autodoc]] top_k_top_p_filtering |
|
[[autodoc]] tf_top_k_top_p_filtering |
|
Streamers |
|
[[autodoc]] TextStreamer |
|
[[autodoc]] TextIteratorStreamer |
|
Caches |
|
[[autodoc]] Cache |
|
- update |
|
[[autodoc]] DynamicCache |
|
- update |
|
- get_seq_length |
|
- reorder_cache |
|
- to_legacy_cache |
|
- from_legacy_cache |
|
[[autodoc]] SinkCache |
|
- update |
|
- get_seq_length |
|
- reorder_cache |
|
[[autodoc]] StaticCache |
|
- update |
|
- get_seq_length |