chrisociepa commited on
Commit
1d21024
·
verified ·
1 Parent(s): 3235166

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -9,15 +9,15 @@ tags:
9
  - 8bit
10
  inference: false
11
  pipeline_tag: text-generation
12
- base_model: speakleash/Bielik-11B-v2.5-Instruct
13
  ---
14
  <p align="center">
15
  <img src="https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-GGUF/raw/main/speakleash_cyfronet.png">
16
  </p>
17
 
18
- # Bielik-11B-v2.5-Instruct-FP8-Dynamic
19
 
20
- This model was obtained by quantizing the weights and activations of [Bielik-11B-v2.5-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.5-Instruct) to FP8 data type, ready for inference with vLLM >= 0.5.0 or SGLang.
21
  AutoFP8 is used for quantization. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.
22
  Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-tensor quantization is applied, in which a single linear scaling maps the FP8 representations of the quantized weights and activations.
23
 
@@ -33,7 +33,7 @@ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/
33
  from vllm import LLM, SamplingParams
34
  from transformers import AutoTokenizer
35
 
36
- model_id = "speakleash/Bielik-11B-v2.5-Instruct-FP8-Dynamic"
37
 
38
  sampling_params = SamplingParams(temperature=0.2, top_p=0.95, max_tokens=4096)
39
 
@@ -61,7 +61,7 @@ vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://do
61
  Launch a server of SGLang Runtime:
62
 
63
  ```
64
- python -m sglang.launch_server --model-path speakleash/Bielik-11B-v2.5-Instruct-FP8-Dynamic --port 30000
65
  ```
66
 
67
  Then you can send http request or use OpenAI Compatible API.
@@ -89,7 +89,7 @@ print(response)
89
  * **Developed by:** [SpeakLeash](https://speakleash.org/) & [ACK Cyfronet AGH](https://www.cyfronet.pl/)
90
  * **Language:** Polish
91
  * **Model type:** causal decoder-only
92
- * **Quant from:** [Bielik-11B-v2.5-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.5-Instruct)
93
  * **Finetuned from:** [Bielik-11B-v2](https://huggingface.co/speakleash/Bielik-11B-v2)
94
  * **License:** Apache 2.0 and [Terms of Use](https://bielik.ai/terms/)
95
 
 
9
  - 8bit
10
  inference: false
11
  pipeline_tag: text-generation
12
+ base_model: speakleash/Bielik-11B-v2.6-Instruct
13
  ---
14
  <p align="center">
15
  <img src="https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-GGUF/raw/main/speakleash_cyfronet.png">
16
  </p>
17
 
18
+ # Bielik-11B-v2.6-Instruct-FP8-Dynamic
19
 
20
+ This model was obtained by quantizing the weights and activations of [Bielik-11B-v2.6-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.6-Instruct) to FP8 data type, ready for inference with vLLM >= 0.5.0 or SGLang.
21
  AutoFP8 is used for quantization. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.
22
  Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-tensor quantization is applied, in which a single linear scaling maps the FP8 representations of the quantized weights and activations.
23
 
 
33
  from vllm import LLM, SamplingParams
34
  from transformers import AutoTokenizer
35
 
36
+ model_id = "speakleash/Bielik-11B-v2.6-Instruct-FP8-Dynamic"
37
 
38
  sampling_params = SamplingParams(temperature=0.2, top_p=0.95, max_tokens=4096)
39
 
 
61
  Launch a server of SGLang Runtime:
62
 
63
  ```
64
+ python -m sglang.launch_server --model-path speakleash/Bielik-11B-v2.6-Instruct-FP8-Dynamic --port 30000
65
  ```
66
 
67
  Then you can send http request or use OpenAI Compatible API.
 
89
  * **Developed by:** [SpeakLeash](https://speakleash.org/) & [ACK Cyfronet AGH](https://www.cyfronet.pl/)
90
  * **Language:** Polish
91
  * **Model type:** causal decoder-only
92
+ * **Quant from:** [Bielik-11B-v2.6-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.6-Instruct)
93
  * **Finetuned from:** [Bielik-11B-v2](https://huggingface.co/speakleash/Bielik-11B-v2)
94
  * **License:** Apache 2.0 and [Terms of Use](https://bielik.ai/terms/)
95