In addition, to perform token-level predictions as required by common pretraining | |
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence | |
via a decoder. |
In addition, to perform token-level predictions as required by common pretraining | |
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence | |
via a decoder. |