update readme
Browse files
README.md
CHANGED
@@ -144,15 +144,23 @@ Compared to `jina-reranker-v2-base-multilingual`, `jina-reranker-m0` significant
|
|
144 |
pip install transformers >= 4.47.3
|
145 |
```
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
And then use the following code snippet to load the model:
|
148 |
|
149 |
```python
|
150 |
from transformers import AutoModel
|
151 |
|
|
|
152 |
model = AutoModel.from_pretrained(
|
153 |
'jinaai/jina-reranker-m0',
|
154 |
torch_dtype="auto",
|
155 |
trust_remote_code=True,
|
|
|
156 |
)
|
157 |
|
158 |
model.to('cuda') # or 'cpu' if no GPU is available
|
|
|
144 |
pip install transformers >= 4.47.3
|
145 |
```
|
146 |
|
147 |
+
If you run it on a GPU that support FlashAttention-2. By 2024.9.12, it supports Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100),
|
148 |
+
|
149 |
+
```bash
|
150 |
+
pip install flash-attn --no-build-isolation
|
151 |
+
```
|
152 |
+
|
153 |
And then use the following code snippet to load the model:
|
154 |
|
155 |
```python
|
156 |
from transformers import AutoModel
|
157 |
|
158 |
+
# comment out the flash_attention_2 line if you don't have a compatible GPU
|
159 |
model = AutoModel.from_pretrained(
|
160 |
'jinaai/jina-reranker-m0',
|
161 |
torch_dtype="auto",
|
162 |
trust_remote_code=True,
|
163 |
+
attn_implementation="flash_attention_2"
|
164 |
)
|
165 |
|
166 |
model.to('cuda') # or 'cpu' if no GPU is available
|