Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /internal_generation_utils.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

8.04 kB


	Utilities for Generation
	This page lists all the utility functions used by [~generation.GenerationMixin.generate],
	[~generation.GenerationMixin.greedy_search],
	[~generation.GenerationMixin.contrastive_search],
	[~generation.GenerationMixin.sample],
	[~generation.GenerationMixin.beam_search],
	[~generation.GenerationMixin.beam_sample],
	[~generation.GenerationMixin.group_beam_search], and
	[~generation.GenerationMixin.constrained_beam_search].
	Most of those are only useful if you are studying the code of the generate methods in the library.
	Generate Outputs
	The output of [~generation.GenerationMixin.generate] is an instance of a subclass of
	[~utils.ModelOutput]. This output is a data structure containing all the information returned
	by [~generation.GenerationMixin.generate], but that can also be used as tuple or dictionary.
	Here's an example:
	thon
	from transformers import GPT2Tokenizer, GPT2LMHeadModel
	tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
	model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
	inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
	generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)

	The generation_output object is a [~generation.GenerateDecoderOnlyOutput], as we can
	see in the documentation of that class below, it means it has the following attributes:

	sequences: the generated sequences of tokens
	scores (optional): the prediction scores of the language modelling head, for each generation step
	hidden_states (optional): the hidden states of the model, for each generation step
	attentions (optional): the attention weights of the model, for each generation step

	Here we have the scores since we passed along output_scores=True, but we don't have hidden_states and
	attentions because we didn't pass output_hidden_states=True or output_attentions=True.
	You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
	will get None. Here for instance generation_output.scores are all the generated prediction scores of the
	language modeling head, and generation_output.attentions is None.
	When using our generation_output object as a tuple, it only keeps the attributes that don't have None values.
	Here, for instance, it has two elements, loss then logits, so
	python
	generation_output[:2]
	will return the tuple (generation_output.sequences, generation_output.scores) for instance.
	When using our generation_output object as a dictionary, it only keeps the attributes that don't have None
	values. Here, for instance, it has two keys that are sequences and scores.
	We document here all output types.
	PyTorch
	[[autodoc]] generation.GenerateDecoderOnlyOutput
	[[autodoc]] generation.GenerateEncoderDecoderOutput
	[[autodoc]] generation.GenerateBeamDecoderOnlyOutput
	[[autodoc]] generation.GenerateBeamEncoderDecoderOutput
	TensorFlow
	[[autodoc]] generation.TFGreedySearchEncoderDecoderOutput
	[[autodoc]] generation.TFGreedySearchDecoderOnlyOutput
	[[autodoc]] generation.TFSampleEncoderDecoderOutput
	[[autodoc]] generation.TFSampleDecoderOnlyOutput
	[[autodoc]] generation.TFBeamSearchEncoderDecoderOutput
	[[autodoc]] generation.TFBeamSearchDecoderOnlyOutput
	[[autodoc]] generation.TFBeamSampleEncoderDecoderOutput
	[[autodoc]] generation.TFBeamSampleDecoderOnlyOutput
	[[autodoc]] generation.TFContrastiveSearchEncoderDecoderOutput
	[[autodoc]] generation.TFContrastiveSearchDecoderOnlyOutput
	FLAX
	[[autodoc]] generation.FlaxSampleOutput
	[[autodoc]] generation.FlaxGreedySearchOutput
	[[autodoc]] generation.FlaxBeamSearchOutput
	LogitsProcessor
	A [LogitsProcessor] can be used to modify the prediction scores of a language model head for
	generation.
	PyTorch
	[[autodoc]] AlternatingCodebooksLogitsProcessor
	- call
	[[autodoc]] ClassifierFreeGuidanceLogitsProcessor
	- call
	[[autodoc]] EncoderNoRepeatNGramLogitsProcessor
	- call
	[[autodoc]] EncoderRepetitionPenaltyLogitsProcessor
	- call
	[[autodoc]] EpsilonLogitsWarper
	- call
	[[autodoc]] EtaLogitsWarper
	- call
	[[autodoc]] ExponentialDecayLengthPenalty
	- call
	[[autodoc]] ForcedBOSTokenLogitsProcessor
	- call
	[[autodoc]] ForcedEOSTokenLogitsProcessor
	- call
	[[autodoc]] ForceTokensLogitsProcessor
	- call
	[[autodoc]] HammingDiversityLogitsProcessor
	- call
	[[autodoc]] InfNanRemoveLogitsProcessor
	- call
	[[autodoc]] LogitNormalization
	- call
	[[autodoc]] LogitsProcessor
	- call
	[[autodoc]] LogitsProcessorList
	- call
	[[autodoc]] LogitsWarper
	- call
	[[autodoc]] MinLengthLogitsProcessor
	- call
	[[autodoc]] MinNewTokensLengthLogitsProcessor
	- call
	[[autodoc]] NoBadWordsLogitsProcessor
	- call
	[[autodoc]] NoRepeatNGramLogitsProcessor
	- call
	[[autodoc]] PrefixConstrainedLogitsProcessor
	- call
	[[autodoc]] RepetitionPenaltyLogitsProcessor
	- call
	[[autodoc]] SequenceBiasLogitsProcessor
	- call
	[[autodoc]] SuppressTokensAtBeginLogitsProcessor
	- call
	[[autodoc]] SuppressTokensLogitsProcessor
	- call
	[[autodoc]] TemperatureLogitsWarper
	- call
	[[autodoc]] TopKLogitsWarper
	- call
	[[autodoc]] TopPLogitsWarper
	- call
	[[autodoc]] TypicalLogitsWarper
	- call
	[[autodoc]] UnbatchedClassifierFreeGuidanceLogitsProcessor
	- call
	[[autodoc]] WhisperTimeStampLogitsProcessor
	- call
	TensorFlow
	[[autodoc]] TFForcedBOSTokenLogitsProcessor
	- call
	[[autodoc]] TFForcedEOSTokenLogitsProcessor
	- call
	[[autodoc]] TFForceTokensLogitsProcessor
	- call
	[[autodoc]] TFLogitsProcessor
	- call
	[[autodoc]] TFLogitsProcessorList
	- call
	[[autodoc]] TFLogitsWarper
	- call
	[[autodoc]] TFMinLengthLogitsProcessor
	- call
	[[autodoc]] TFNoBadWordsLogitsProcessor
	- call
	[[autodoc]] TFNoRepeatNGramLogitsProcessor
	- call
	[[autodoc]] TFRepetitionPenaltyLogitsProcessor
	- call
	[[autodoc]] TFSuppressTokensAtBeginLogitsProcessor
	- call
	[[autodoc]] TFSuppressTokensLogitsProcessor
	- call
	[[autodoc]] TFTemperatureLogitsWarper
	- call
	[[autodoc]] TFTopKLogitsWarper
	- call
	[[autodoc]] TFTopPLogitsWarper
	- call
	FLAX
	[[autodoc]] FlaxForcedBOSTokenLogitsProcessor
	- call
	[[autodoc]] FlaxForcedEOSTokenLogitsProcessor
	- call
	[[autodoc]] FlaxForceTokensLogitsProcessor
	- call
	[[autodoc]] FlaxLogitsProcessor
	- call
	[[autodoc]] FlaxLogitsProcessorList
	- call
	[[autodoc]] FlaxLogitsWarper
	- call
	[[autodoc]] FlaxMinLengthLogitsProcessor
	- call
	[[autodoc]] FlaxSuppressTokensAtBeginLogitsProcessor
	- call
	[[autodoc]] FlaxSuppressTokensLogitsProcessor
	- call
	[[autodoc]] FlaxTemperatureLogitsWarper
	- call
	[[autodoc]] FlaxTopKLogitsWarper
	- call
	[[autodoc]] FlaxTopPLogitsWarper
	- call
	[[autodoc]] FlaxWhisperTimeStampLogitsProcessor
	- call
	StoppingCriteria
	A [StoppingCriteria] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
	[[autodoc]] StoppingCriteria
	- call
	[[autodoc]] StoppingCriteriaList
	- call
	[[autodoc]] MaxLengthCriteria
	- call
	[[autodoc]] MaxTimeCriteria
	- call
	Constraints
	A [Constraint] can be used to force the generation to include specific tokens or sequences in the output. Please note that this is exclusively available to our PyTorch implementations.
	[[autodoc]] Constraint
	[[autodoc]] PhrasalConstraint
	[[autodoc]] DisjunctiveConstraint
	[[autodoc]] ConstraintListState
	BeamSearch
	[[autodoc]] BeamScorer
	- process
	- finalize
	[[autodoc]] BeamSearchScorer
	- process
	- finalize
	[[autodoc]] ConstrainedBeamSearchScorer
	- process
	- finalize
	Utilities
	[[autodoc]] top_k_top_p_filtering
	[[autodoc]] tf_top_k_top_p_filtering
	Streamers
	[[autodoc]] TextStreamer
	[[autodoc]] TextIteratorStreamer
	Caches
	[[autodoc]] Cache
	- update
	[[autodoc]] DynamicCache
	- update
	- get_seq_length
	- reorder_cache
	- to_legacy_cache
	- from_legacy_cache
	[[autodoc]] SinkCache
	- update
	- get_seq_length
	- reorder_cache
	[[autodoc]] StaticCache
	- update
	- get_seq_length