File size: 105 Bytes
5fa1a76
1
All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens.