Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory-usage.