Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 201 Bytes

5fa1a76

Furthermore, we use reversible residual
layers instead of the standard residuals, which allows storing activations only once in the training process instead of
N times, where N is the number of layers.