Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
We carefully characterize the trade-offs in terms of parameter count,
training FLOPs, and inference speed, and show that byte-level models are competitive with their token-level
counterparts.