Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
149 Bytes
In that case, the whole batch will need to be 400
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown.