Specifically | |
it allows for more fine-grained inputs (4 x 4 pixels per patch) to be used, while simultaneously shrinking the sequence length | |
of the Transformer as it deepens - reducing the computational cost. |
Specifically | |
it allows for more fine-grained inputs (4 x 4 pixels per patch) to be used, while simultaneously shrinking the sequence length | |
of the Transformer as it deepens - reducing the computational cost. |