Usage tips | |
ImageGPT is almost exactly the same as GPT-2, with the exception that a different activation | |
function is used (namely "quick gelu"), and the layer normalization layers don't mean center the inputs. |
Usage tips | |
ImageGPT is almost exactly the same as GPT-2, with the exception that a different activation | |
function is used (namely "quick gelu"), and the layer normalization layers don't mean center the inputs. |