JustinLin610 commited on
Commit
3dc6528
·
verified ·
1 Parent(s): 255ac3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -63,9 +63,11 @@ To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (t
63
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
64
 
65
  ```bash
66
- ./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf
67
  ```
68
 
 
 
69
  Then it is easy to access the deployed service with OpenAI API:
70
 
71
  ```python
@@ -89,7 +91,11 @@ print(completion.choices[0].message.content)
89
  If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
90
 
91
  ```bash
92
- ./llama-cli -m qwen2-57b-a14b-instruct-q5_0.gguf -n 512 -co -i -if -f prompts/chat-with-qwen.txt --in-prefix "<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
 
 
 
 
93
  ```
94
 
95
  ## Citation
 
63
  We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
64
 
65
  ```bash
66
+ ./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf -ngl 28 -fa
67
  ```
68
 
69
+ (Note: `-ngl 28` refers to offloading all layers to GPUs, and `-fa` refers to the use of flash attention.)
70
+
71
  Then it is easy to access the deployed service with OpenAI API:
72
 
73
  ```python
 
91
  If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
92
 
93
  ```bash
94
+ ./llama-cli -m qwen2-57b-a14b-instruct-q5_0.gguf \
95
+ -n 512 -co -i -if -f prompts/chat-with-qwen.txt \
96
+ --in-prefix "<|im_start|>user\n" \
97
+ --in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
98
+ -ngl 28 -fa
99
  ```
100
 
101
  ## Citation