Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
forward peak memory/batch size
generate peak memory/batch size
generate throughput/batch size
forward latency/batch size
The benchmarks indicate AWQ quantization is the fastest for inference, text generation, and has the lowest peak memory for text generation.