Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Since byte or character sequences are longer than token
sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of
operating directly on raw text.