hf (pretrained=../,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16 | Tasks |Version| Filter |n-shot|Metric| |Value | |Stderr| |---------|------:|-----------|-----:|------|---|-----:|---|-----:| |humaneval| 1|create_test| 0|pass@1| |0.3232|± |0.0366|