shubhrapandit commited on
Commit
9d89e98
·
verified ·
1 Parent(s): 1dbd82f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -8
README.md CHANGED
@@ -221,11 +221,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
221
  <th>Model</th>
222
  <th>Average Cost Reduction</th>
223
  <th>Latency (s)</th>
224
- <th>QPD</th>
225
  <th>Latency (s)th>
226
- <th>QPD</th>
227
  <th>Latency (s)</th>
228
- <th>QPD</th>
229
  </tr>
230
  </thead>
231
  <tbody style="text-align: center">
@@ -293,7 +293,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
293
  </tr>
294
  </tbody>
295
  </table>
296
-
 
 
 
297
 
298
 
299
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
@@ -313,11 +316,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
313
  <th>Model</th>
314
  <th>Average Cost Reduction</th>
315
  <th>Maximum throughput (QPS)</th>
316
- <th>QPD</th>
317
  <th>Maximum throughput (QPS)</th>
318
- <th>QPD</th>
319
  <th>Maximum throughput (QPS)</th>
320
- <th>QPD</th>
321
  </tr>
322
  </thead>
323
  <tbody style="text-align: center">
@@ -384,4 +387,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
384
  <td>4573</td>
385
  </tr>
386
  </tbody>
387
- </table>
 
 
 
 
 
 
 
221
  <th>Model</th>
222
  <th>Average Cost Reduction</th>
223
  <th>Latency (s)</th>
224
+ <th>Queries Per Dollar</th>
225
  <th>Latency (s)th>
226
+ <th>Queries Per Dollar</th>
227
  <th>Latency (s)</th>
228
+ <th>Queries Per Dollar</th>
229
  </tr>
230
  </thead>
231
  <tbody style="text-align: center">
 
293
  </tr>
294
  </tbody>
295
  </table>
296
+
297
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
298
+
299
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
300
 
301
 
302
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
 
316
  <th>Model</th>
317
  <th>Average Cost Reduction</th>
318
  <th>Maximum throughput (QPS)</th>
319
+ <th>Queries Per Dollar</th>
320
  <th>Maximum throughput (QPS)</th>
321
+ <th>Queries Per Dollar</th>
322
  <th>Maximum throughput (QPS)</th>
323
+ <th>Queries Per Dollar</th>
324
  </tr>
325
  </thead>
326
  <tbody style="text-align: center">
 
387
  <td>4573</td>
388
  </tr>
389
  </tbody>
390
+ </table>
391
+
392
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
393
+
394
+ **QPS: Queries per second.
395
+
396
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).