Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Next, the
feature map is flattened and transposed to obtain a tensor of shape (batch_size, seq_len, d_model) =
(batch_size, width/32*height/32, 256).