JustinLin610 commited on
Commit
a8f8260
·
verified ·
1 Parent(s): 3dc6528

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -66,7 +66,7 @@ We recommend using the `llama-server` as it is simple and compatible with OpenAI
66
  ./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf -ngl 28 -fa
67
  ```
68
 
69
- (Note: `-ngl 28` refers to offloading all layers to GPUs, and `-fa` refers to the use of flash attention.)
70
 
71
  Then it is easy to access the deployed service with OpenAI API:
72
 
 
66
  ./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf -ngl 28 -fa
67
  ```
68
 
69
+ (Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
70
 
71
  Then it is easy to access the deployed service with OpenAI API:
72