The abstract from the Phi-1 paper is the following: | |
We introduce phi-1, a new large language model for code, with significantly smaller size than | |
competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on | |
8 A100s, using a selection of “textbook quality” data from the web (6B tokens) and synthetically | |
generated textbooks and exercises with GPT-3.5 (1B tokens). |