We train a family of large language models, called CodeGen, on natural language and programming language data.