shubhrapandit commited on
Commit
72343dc
·
verified ·
1 Parent(s): 804f00c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -6
README.md CHANGED
@@ -222,11 +222,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
222
  <th>Model</th>
223
  <th>Average Cost Reduction</th>
224
  <th>Latency (s)</th>
225
- <th>QPD</th>
226
  <th>Latency (s)th>
227
- <th>QPD</th>
228
  <th>Latency (s)</th>
229
- <th>QPD</th>
230
  </tr>
231
  </thead>
232
  <tbody>
@@ -299,6 +299,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
299
  </tbody>
300
  </table>
301
 
 
 
 
302
 
303
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
304
 
@@ -317,11 +320,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
317
  <th>Model</th>
318
  <th>Average Cost Reduction</th>
319
  <th>Maximum throughput (QPS)</th>
320
- <th>QPD</th>
321
  <th>Maximum throughput (QPS)</th>
322
- <th>QPD</th>
323
  <th>Maximum throughput (QPS)</th>
324
- <th>QPD</th>
325
  </tr>
326
  </thead>
327
  <tbody style="text-align: center">
@@ -393,3 +396,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
393
  </tr>
394
  </tbody>
395
  </table>
 
 
 
 
 
 
 
222
  <th>Model</th>
223
  <th>Average Cost Reduction</th>
224
  <th>Latency (s)</th>
225
+ <th>Queries Per Dollar</th>
226
  <th>Latency (s)th>
227
+ <th>Queries Per Dollar</th>
228
  <th>Latency (s)</th>
229
+ <th>Queries Per Dollar</th>
230
  </tr>
231
  </thead>
232
  <tbody>
 
299
  </tbody>
300
  </table>
301
 
302
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
303
+
304
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
305
 
306
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
307
 
 
320
  <th>Model</th>
321
  <th>Average Cost Reduction</th>
322
  <th>Maximum throughput (QPS)</th>
323
+ <th>Queries Per Dollar</th>
324
  <th>Maximum throughput (QPS)</th>
325
+ <th>Queries Per Dollar</th>
326
  <th>Maximum throughput (QPS)</th>
327
+ <th>Queries Per Dollar</th>
328
  </tr>
329
  </thead>
330
  <tbody style="text-align: center">
 
396
  </tr>
397
  </tbody>
398
  </table>
399
+
400
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
401
+
402
+ **QPS: Queries per second.
403
+
404
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).