One | |
can then, similar to BERT, convert the last hidden states of the latents to classification logits by averaging along | |
the sequence dimension, and placing a linear layer on top of that to project the d_latents to num_labels. |
One | |
can then, similar to BERT, convert the last hidden states of the latents to classification logits by averaging along | |
the sequence dimension, and placing a linear layer on top of that to project the d_latents to num_labels. |