Update README.md
Browse files
README.md
CHANGED
@@ -66,7 +66,7 @@ We recommend using the `llama-server` as it is simple and compatible with OpenAI
|
|
66 |
./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf -ngl 28 -fa
|
67 |
```
|
68 |
|
69 |
-
(Note: `-ngl 28` refers to offloading
|
70 |
|
71 |
Then it is easy to access the deployed service with OpenAI API:
|
72 |
|
|
|
66 |
./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf -ngl 28 -fa
|
67 |
```
|
68 |
|
69 |
+
(Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
|
70 |
|
71 |
Then it is easy to access the deployed service with OpenAI API:
|
72 |
|