5fa1a76
1
2
3
Usage tips Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.