RAG / knowledge_base /internal_generation_utils.txt
Ahmadzei's picture
update 1
57bdca5
Utilities for Generation
This page lists all the utility functions used by [~generation.GenerationMixin.generate],
[~generation.GenerationMixin.greedy_search],
[~generation.GenerationMixin.contrastive_search],
[~generation.GenerationMixin.sample],
[~generation.GenerationMixin.beam_search],
[~generation.GenerationMixin.beam_sample],
[~generation.GenerationMixin.group_beam_search], and
[~generation.GenerationMixin.constrained_beam_search].
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
The output of [~generation.GenerationMixin.generate] is an instance of a subclass of
[~utils.ModelOutput]. This output is a data structure containing all the information returned
by [~generation.GenerationMixin.generate], but that can also be used as tuple or dictionary.
Here's an example:
thon
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The generation_output object is a [~generation.GenerateDecoderOnlyOutput], as we can
see in the documentation of that class below, it means it has the following attributes:
sequences: the generated sequences of tokens
scores (optional): the prediction scores of the language modelling head, for each generation step
hidden_states (optional): the hidden states of the model, for each generation step
attentions (optional): the attention weights of the model, for each generation step
Here we have the scores since we passed along output_scores=True, but we don't have hidden_states and
attentions because we didn't pass output_hidden_states=True or output_attentions=True.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None. Here for instance generation_output.scores are all the generated prediction scores of the
language modeling head, and generation_output.attentions is None.
When using our generation_output object as a tuple, it only keeps the attributes that don't have None values.
Here, for instance, it has two elements, loss then logits, so
python
generation_output[:2]
will return the tuple (generation_output.sequences, generation_output.scores) for instance.
When using our generation_output object as a dictionary, it only keeps the attributes that don't have None
values. Here, for instance, it has two keys that are sequences and scores.
We document here all output types.
PyTorch
[[autodoc]] generation.GenerateDecoderOnlyOutput
[[autodoc]] generation.GenerateEncoderDecoderOutput
[[autodoc]] generation.GenerateBeamDecoderOnlyOutput
[[autodoc]] generation.GenerateBeamEncoderDecoderOutput
TensorFlow
[[autodoc]] generation.TFGreedySearchEncoderDecoderOutput
[[autodoc]] generation.TFGreedySearchDecoderOnlyOutput
[[autodoc]] generation.TFSampleEncoderDecoderOutput
[[autodoc]] generation.TFSampleDecoderOnlyOutput
[[autodoc]] generation.TFBeamSearchEncoderDecoderOutput
[[autodoc]] generation.TFBeamSearchDecoderOnlyOutput
[[autodoc]] generation.TFBeamSampleEncoderDecoderOutput
[[autodoc]] generation.TFBeamSampleDecoderOnlyOutput
[[autodoc]] generation.TFContrastiveSearchEncoderDecoderOutput
[[autodoc]] generation.TFContrastiveSearchDecoderOnlyOutput
FLAX
[[autodoc]] generation.FlaxSampleOutput
[[autodoc]] generation.FlaxGreedySearchOutput
[[autodoc]] generation.FlaxBeamSearchOutput
LogitsProcessor
A [LogitsProcessor] can be used to modify the prediction scores of a language model head for
generation.
PyTorch
[[autodoc]] AlternatingCodebooksLogitsProcessor
- call
[[autodoc]] ClassifierFreeGuidanceLogitsProcessor
- call
[[autodoc]] EncoderNoRepeatNGramLogitsProcessor
- call
[[autodoc]] EncoderRepetitionPenaltyLogitsProcessor
- call
[[autodoc]] EpsilonLogitsWarper
- call
[[autodoc]] EtaLogitsWarper
- call
[[autodoc]] ExponentialDecayLengthPenalty
- call
[[autodoc]] ForcedBOSTokenLogitsProcessor
- call
[[autodoc]] ForcedEOSTokenLogitsProcessor
- call
[[autodoc]] ForceTokensLogitsProcessor
- call
[[autodoc]] HammingDiversityLogitsProcessor
- call
[[autodoc]] InfNanRemoveLogitsProcessor
- call
[[autodoc]] LogitNormalization
- call
[[autodoc]] LogitsProcessor
- call
[[autodoc]] LogitsProcessorList
- call
[[autodoc]] LogitsWarper
- call
[[autodoc]] MinLengthLogitsProcessor
- call
[[autodoc]] MinNewTokensLengthLogitsProcessor
- call
[[autodoc]] NoBadWordsLogitsProcessor
- call
[[autodoc]] NoRepeatNGramLogitsProcessor
- call
[[autodoc]] PrefixConstrainedLogitsProcessor
- call
[[autodoc]] RepetitionPenaltyLogitsProcessor
- call
[[autodoc]] SequenceBiasLogitsProcessor
- call
[[autodoc]] SuppressTokensAtBeginLogitsProcessor
- call
[[autodoc]] SuppressTokensLogitsProcessor
- call
[[autodoc]] TemperatureLogitsWarper
- call
[[autodoc]] TopKLogitsWarper
- call
[[autodoc]] TopPLogitsWarper
- call
[[autodoc]] TypicalLogitsWarper
- call
[[autodoc]] UnbatchedClassifierFreeGuidanceLogitsProcessor
- call
[[autodoc]] WhisperTimeStampLogitsProcessor
- call
TensorFlow
[[autodoc]] TFForcedBOSTokenLogitsProcessor
- call
[[autodoc]] TFForcedEOSTokenLogitsProcessor
- call
[[autodoc]] TFForceTokensLogitsProcessor
- call
[[autodoc]] TFLogitsProcessor
- call
[[autodoc]] TFLogitsProcessorList
- call
[[autodoc]] TFLogitsWarper
- call
[[autodoc]] TFMinLengthLogitsProcessor
- call
[[autodoc]] TFNoBadWordsLogitsProcessor
- call
[[autodoc]] TFNoRepeatNGramLogitsProcessor
- call
[[autodoc]] TFRepetitionPenaltyLogitsProcessor
- call
[[autodoc]] TFSuppressTokensAtBeginLogitsProcessor
- call
[[autodoc]] TFSuppressTokensLogitsProcessor
- call
[[autodoc]] TFTemperatureLogitsWarper
- call
[[autodoc]] TFTopKLogitsWarper
- call
[[autodoc]] TFTopPLogitsWarper
- call
FLAX
[[autodoc]] FlaxForcedBOSTokenLogitsProcessor
- call
[[autodoc]] FlaxForcedEOSTokenLogitsProcessor
- call
[[autodoc]] FlaxForceTokensLogitsProcessor
- call
[[autodoc]] FlaxLogitsProcessor
- call
[[autodoc]] FlaxLogitsProcessorList
- call
[[autodoc]] FlaxLogitsWarper
- call
[[autodoc]] FlaxMinLengthLogitsProcessor
- call
[[autodoc]] FlaxSuppressTokensAtBeginLogitsProcessor
- call
[[autodoc]] FlaxSuppressTokensLogitsProcessor
- call
[[autodoc]] FlaxTemperatureLogitsWarper
- call
[[autodoc]] FlaxTopKLogitsWarper
- call
[[autodoc]] FlaxTopPLogitsWarper
- call
[[autodoc]] FlaxWhisperTimeStampLogitsProcessor
- call
StoppingCriteria
A [StoppingCriteria] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
[[autodoc]] StoppingCriteria
- call
[[autodoc]] StoppingCriteriaList
- call
[[autodoc]] MaxLengthCriteria
- call
[[autodoc]] MaxTimeCriteria
- call
Constraints
A [Constraint] can be used to force the generation to include specific tokens or sequences in the output. Please note that this is exclusively available to our PyTorch implementations.
[[autodoc]] Constraint
[[autodoc]] PhrasalConstraint
[[autodoc]] DisjunctiveConstraint
[[autodoc]] ConstraintListState
BeamSearch
[[autodoc]] BeamScorer
- process
- finalize
[[autodoc]] BeamSearchScorer
- process
- finalize
[[autodoc]] ConstrainedBeamSearchScorer
- process
- finalize
Utilities
[[autodoc]] top_k_top_p_filtering
[[autodoc]] tf_top_k_top_p_filtering
Streamers
[[autodoc]] TextStreamer
[[autodoc]] TextIteratorStreamer
Caches
[[autodoc]] Cache
- update
[[autodoc]] DynamicCache
- update
- get_seq_length
- reorder_cache
- to_legacy_cache
- from_legacy_cache
[[autodoc]] SinkCache
- update
- get_seq_length
- reorder_cache
[[autodoc]] StaticCache
- update
- get_seq_length