Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Below is an expected speedup diagram that compares pure inference time between the native implementation in transformers using facebook/opt-350m checkpoint and the Flash Attention 2 version of the model using two different sequence lengths.