Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The
idea is actually relatively simple: one defines outputs of an arbitrary size, and then applies cross-attention with the
last hidden states of the latents, using the outputs as queries, and the latents as keys and values.