Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

467 Bytes

	Also, by stacking attention layers that have a small
	window, the last layer will have a receptive field of more than just the tokens in the window, allowing them to build a
	representation of the whole sentence.
	Some preselected input tokens are also given global attention: for those few tokens, the attention matrix can access
	all tokens and this process is symmetric: all other tokens have access to those specific tokens (on top of the ones in
	their local window).