RedHatAI
/

Qwen2-VL-72B-Instruct-quantized.w4a16

text-generation-inference

compressed-tensors

Model card Files Files and versions

shubhrapandit commited on Feb 25

Commit

e9c8f58

·

verified ·

1 Parent(s): 2586522

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -334,7 +334,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
   </thead>
   <tbody>
     <tr>
-      <th rowspan="3" valign="top">A100x4</th>
       <td>Qwen/Qwen2-VL-72B-Instruct</td>
       <td></td>
       <td>0.3</td>
@@ -378,8 +378,8 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
       <td>377</td>
     </tr>
     <tr>
-      <td>neuralmagic/Qwen2-VL-72B-Instruct-FP8-Dynamic</td>
       <td>H100x2</td>
       <td>1.70</td>
       <td>0.8</td>
       <td>236</td>
@@ -389,8 +389,8 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
       <td>669</td>
     </tr>
     <tr>
-      <td>neuralmagic/Qwen2-VL-72B-Instruct-quantized.w4a16</td>
       <td>H100x1</td>
       <td>2.35</td>
       <td>1.3</td>
       <td>350</td>

   </thead>
   <tbody>
     <tr>
+      <td>A100x4</td>
       <td>Qwen/Qwen2-VL-72B-Instruct</td>
       <td></td>
       <td>0.3</td>
       <td>377</td>
     </tr>
     <tr>
       <td>H100x2</td>
+      <td>neuralmagic/Qwen2-VL-72B-Instruct-FP8-Dynamic</td>
       <td>1.70</td>
       <td>0.8</td>
       <td>236</td>
       <td>669</td>
     </tr>
     <tr>
       <td>H100x1</td>
+      <td>neuralmagic/Qwen2-VL-72B-Instruct-quantized.w4a16</td>
       <td>2.35</td>
       <td>1.3</td>
       <td>350</td>