Furthermore, we use reversible residual | |
layers instead of the standard residuals, which allows storing activations only once in the training process instead of | |
N times, where N is the number of layers. |
Furthermore, we use reversible residual | |
layers instead of the standard residuals, which allows storing activations only once in the training process instead of | |
N times, where N is the number of layers. |