|
config = BertConfig( |
|
vocab_size_or_config_json_file=32000, |
|
hidden_size=768, |
|
num_hidden_layers=12, |
|
num_attention_heads=12, |
|
intermediate_size=3072, |
|
torchscript=True, |
|
) |
|
Instantiating the model |
|
model = BertModel(config) |
|
The model needs to be in evaluation mode |
|
model.eval() |
|
If you are instantiating the model with from_pretrained you can also easily set the TorchScript flag |
|
model = BertModel.from_pretrained("google-bert/bert-base-uncased", torchscript=True) |
|
Creating the trace |
|
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors]) |
|
torch.jit.save(traced_model, "traced_bert.pt") |
|
|
|
Loading a model |
|
Now you can load the previously saved BertModel, traced_bert.pt, from disk and use |
|
it on the previously initialised dummy_input: |
|
thon |
|
loaded_model = torch.jit.load("traced_bert.pt") |
|
loaded_model.eval() |
|
all_encoder_layers, pooled_output = loaded_model(*dummy_input) |
|
|
|
Using a traced model for inference |
|
Use the traced model for inference by using its __call__ dunder method: |
|
python |
|
traced_model(tokens_tensor, segments_tensors) |
|
Deploy Hugging Face TorchScript models to AWS with the Neuron SDK |
|
AWS introduced the Amazon EC2 Inf1 |
|
instance family for low cost, high performance machine learning inference in the cloud. |