Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /model_doc_t5v1.1.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

1.65 kB


	T5v1.1
	Overview
	T5v1.1 was released in the google-research/text-to-text-transfer-transformer
	repository by Colin Raffel et al. It's an improved version of the original T5 model.
	This model was contributed by patrickvonplaten. The original code can be
	found here.
	Usage tips
	One can directly plug in the weights of T5v1.1 into a T5 model, like so:
	thon

	from transformers import T5ForConditionalGeneration
	model = T5ForConditionalGeneration.from_pretrained("google/t5-v1_1-base")

	T5 Version 1.1 includes the following improvements compared to the original T5 model:

	GEGLU activation in the feed-forward hidden layer, rather than ReLU. See this paper.

	Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.

	Pre-trained on C4 only without mixing in the downstream tasks.

	No parameter sharing between the embedding and classifier layer.

	"xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller
	num_heads and d_ff.

	Note: T5 Version 1.1 was only pre-trained on C4 excluding any supervised
	training. Therefore, this model has to be fine-tuned before it is usable on a downstream task, unlike the original T5
	model. Since t5v1.1 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
	fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
	Google has released the following variants:

	google/t5-v1_1-small

	google/t5-v1_1-base

	google/t5-v1_1-large

	google/t5-v1_1-xl

	google/t5-v1_1-xxl.

	Refer to T5's documentation page for all API reference, tips, code examples and notebooks.