Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
As the time- and memory requirements of the attention mechanism of Transformers scales quadratically in the sequence
length, the authors pre-trained ImageGPT on smaller input resolutions, such as 32x32 and 64x64.