Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.

The monster coder in MOE (Mixture of Experts) 2x32B (with shared expert) configuration.

The two best Coders in one that are stronger than the sum of their parts.

Both models code together.

Info about each model below, followed by settings/info on using this MOE model.

Qwen2.5-Coder-32B-Instruct

Introduction

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
Long-context Support up to 128K tokens.

This repo contains the instruction-tuned 32B Qwen2.5-Coder model, which has the following features:

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
Number of Parameters: 32.5B
Number of Paramaters (Non-Embedding): 31.0B
Number of Layers: 64
Number of Attention Heads (GQA): 40 for Q and 8 for KV
Context Length: Full 131,072 tokens
- Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts.

For more details, please refer to our blog, [GitHub](https://github.com/QwenLM/

Model Card for OlympicCoder-32B

OlympicCoder-32B is a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics.

Repository: https://github.com/huggingface/open-r1
Blog post: https://huggingface.co/blog/open-r1/update-3

Model description

Model type: A 32B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
Language(s) (NLP): Primarily English
License: apache-2.0
Finetuned from model: Qwen/Qwen2.5-Coder-32B-Instruct

Evaluation

We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:

IOI'2024: 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
LiveCodeBench: Python programming problems source from platforms like CodeForces and LeetCoder. We use the v4_v5 subset of livecodebench/code_generation_lite, which corresponds to 268 problems. We use lighteval to evaluate models on LiveCodeBench using the sampling parameters described here.

The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially out-of-domain, since this expects models to output solutions in Python.

For more info on this model, including benchmarks see:

https://huggingface.co/open-r1/OlympicCoder-32B

Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1

Model Settings / info:

Max context: 32k.

Super special thanks to Qwen and Open-R1 for making such fantastic models.

Suggested Settings:

Temp .5 to .7 (or lower)
topk: 20, topp: .8, minp: .05 (topp, minp can be .95 and .05)
rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05)
Jinja Template (embedded) or CHATML template.
A System Prompt is not required. (ran tests with blank system prompt)

System Prompt:

If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions.

This will cut down prompt size and focus the model.

Activated Experts:

Model default is set to 2 experts activated. It will run with one expert activated.

Generation:

Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated.

This will give you a large selection of varied code to choose from.

I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s).

These generation suggestions can create stronger, more compact code - and in some cases faster code too.

For more information / other Qwen/Mistral Coders / additional settings see:

[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]

[model card pending updates]

For settings, parameters and other details also see:

https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

and/or

https://huggingface.co/open-r1/OlympicCoder-32B

Help, Adjustments, Samplers, Parameters and More

CHANGE THE NUMBER OF ACTIVE EXPERTS:

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

DavidAU
/

Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-v1.1

Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1

Qwen2.5-Coder-32B-Instruct

Introduction

Model Card for OlympicCoder-32B

Model description

Evaluation

Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1

Help, Adjustments, Samplers, Parameters and More

Model tree for DavidAU/Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-v1.1

Collections including DavidAU/Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-v1.1

Source files for GGUF, EXL2, AWQ, GPTQ, HQQ etc etc

MOE/Mixture of Experts Models (see also "source" cll)

Coder and Programming Models