Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 241 Bytes

5fa1a76

`torch.float16``)
To load and run a model using Flash Attention 2, refer to the snippet below:
thon

import torch
from transformers import PhiForCausalLM, AutoTokenizer
define the model and tokenizer and push the model and tokens to the GPU.