Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
180 Bytes
The result is a new attention mechanism we call {\em Transient Global}
(TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs.