Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
One
can then, similar to BERT, convert the last hidden states of the latents to classification logits by averaging along
the sequence dimension, and placing a linear layer on top of that to project the d_latents to num_labels.