Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Usage tips
ImageGPT is almost exactly the same as GPT-2, with the exception that a different activation
function is used (namely "quick gelu"), and the layer normalization layers don't mean center the inputs.