Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
207 Bytes
Specifically
it allows for more fine-grained inputs (4 x 4 pixels per patch) to be used, while simultaneously shrinking the sequence length
of the Transformer as it deepens - reducing the computational cost.