Papers
arxiv:1806.01603

Layer rotation: a surprisingly powerful indicator of generalization in deep networks?

Published on Jun 5, 2018
Authors:
,

Abstract

Layer rotation, measured by the cosine distance between weights and their initialization, is a consistent indicator of generalization performance in neural networks.

AI-generated summary

Our work presents extensive empirical evidence that layer rotation, i.e. the evolution across training of the cosine distance between each layer's weight vector and its initialization, constitutes an impressively consistent indicator of generalization performance. In particular, larger cosine distances between final and initial weights of each layer consistently translate into better generalization performance of the final model. Interestingly, this relation admits a network independent optimum: training procedures during which all layers' weights reach a cosine distance of 1 from their initialization consistently outperform other configurations -by up to 30% test accuracy. Moreover, we show that layer rotations are easily monitored and controlled (helpful for hyperparameter tuning) and potentially provide a unified framework to explain the impact of learning rate tuning, weight decay, learning rate warmups and adaptive gradient methods on generalization and training speed. In an attempt to explain the surprising properties of layer rotation, we show on a 1-layer MLP trained on MNIST that layer rotation correlates with the degree to which features of intermediate layers have been trained.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1806.01603 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1806.01603 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1806.01603 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.