shubhrapandit commited on
Commit
1fb421c
·
verified ·
1 Parent(s): f809972

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -186,18 +186,60 @@ oneshot(
186
 
187
  ## Evaluation
188
 
189
- The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard), OpenLLM Leaderboard [V2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) and on [HumanEval](https://github.com/neuralmagic/evalplus), using the following commands:
190
 
191
  <details>
192
  <summary>Evaluation Commands</summary>
 
 
 
 
 
 
 
 
 
 
193
 
 
 
 
 
 
194
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
195
  ```
196
 
 
 
 
 
 
 
 
 
 
 
 
 
197
  </details>
198
 
 
199
  ### Accuracy
200
 
 
201
  ## Inference Performance
202
 
203
 
 
186
 
187
  ## Evaluation
188
 
189
+ The model was evaluated using [mistral-evals](https://github.com/neuralmagic/mistral-evals) for vision-related tasks and using [lm_evaluation_harness](https://github.com/neuralmagic/lm-evaluation-harness) for select text-based benchmarks. The evaluations were conducted using the following commands:
190
 
191
  <details>
192
  <summary>Evaluation Commands</summary>
193
+
194
+ ### Vision Tasks
195
+ - vqav2
196
+ - docvqa
197
+ - mathvista
198
+ - mmmu
199
+ - chartqa
200
+
201
+ ```
202
+ vllm serve neuralmagic/pixtral-12b-quantized.w8a8 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7
203
 
204
+ python -m eval.run eval_vllm \
205
+ --model_name neuralmagic/pixtral-12b-quantized.w8a8 \
206
+ --url http://0.0.0.0:8000 \
207
+ --output_dir ~/tmp \
208
+ --eval_name <vision_task_name>
209
  ```
210
+
211
+ ### Text-based Tasks
212
+ #### MMLU
213
+
214
+ ```
215
+ lm_eval \
216
+ --model vllm \
217
+ --model_args pretrained="<model_name>",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=<n>,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True \
218
+ --tasks mmlu \
219
+ --num_fewshot 5 \
220
+ --batch_size auto \
221
+ --output_path output_dir
222
+
223
  ```
224
 
225
+ #### MGSM
226
+
227
+ ```
228
+ lm_eval \
229
+ --model vllm \
230
+ --model_args pretrained="<model_name>",dtype=auto,max_model_len=4096,max_gen_toks=2048,max_num_seqs=128,tensor_parallel_size=<n>,gpu_memory_utilization=0.9 \
231
+ --tasks mgsm_cot_native \
232
+ --num_fewshot 0 \
233
+ --batch_size auto \
234
+ --output_path output_dir
235
+
236
+ ```
237
  </details>
238
 
239
+
240
  ### Accuracy
241
 
242
+
243
  ## Inference Performance
244
 
245