The format is: Salesforce/codegen-{size}-{data}, where size: 350M, 2B, 6B, 16B data: nl: Pre-trained on the Pile multi: Initialized with nl, then further pre-trained on multiple programming languages data mono: Initialized with multi, then further pre-trained on Python data For example, Salesforce/codegen-350M-mono offers a 350 million-parameter checkpoint pre-trained sequentially on the Pile, multiple programming languages, and Python.