File size: 226 Bytes
5fa1a76
 
 
1
2
3
One
can then, similar to BERT, convert the last hidden states of the latents to classification logits by averaging along
the sequence dimension, and placing a linear layer on top of that to project the d_latents to num_labels.