Qwen
/

Qwen2-57B-A14B-Instruct-GGUF

Text Generation

Model card Files Files and versions Community

JustinLin610 commited on Jun 17, 2024

Commit

a8f8260

·

verified ·

1 Parent(s): 3dc6528

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -66,7 +66,7 @@ We recommend using the `llama-server` as it is simple and compatible with OpenAI
 ./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf -ngl 28 -fa
 ```
-(Note: `-ngl 28` refers to offloading all layers to GPUs, and `-fa` refers to the use of flash attention.)
 Then it is easy to access the deployed service with OpenAI API:

 ./llama-server -m qwen2-57b-a14b-instruct-q5_0.gguf -ngl 28 -fa
 ```
+(Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
 Then it is easy to access the deployed service with OpenAI API: