Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
248 Bytes
In models that are treating very long input sequences, the
conventional position id encodings store an embeddings vector of size \(d\) being the config.hidden_size for
every position \(i, \ldots, n_s\), with \(n_s\) being config.max_embedding_size.