Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
config = BertConfig(
vocab_size_or_config_json_file=32000,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
torchscript=True,
)
Instantiating the model
model = BertModel(config)
The model needs to be in evaluation mode
model.eval()
If you are instantiating the model with from_pretrained you can also easily set the TorchScript flag
model = BertModel.from_pretrained("google-bert/bert-base-uncased", torchscript=True)
Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")
Loading a model
Now you can load the previously saved BertModel, traced_bert.pt, from disk and use
it on the previously initialised dummy_input:
thon
loaded_model = torch.jit.load("traced_bert.pt")
loaded_model.eval()
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
Using a traced model for inference
Use the traced model for inference by using its __call__ dunder method:
python
traced_model(tokens_tensor, segments_tensors)
Deploy Hugging Face TorchScript models to AWS with the Neuron SDK
AWS introduced the Amazon EC2 Inf1
instance family for low cost, high performance machine learning inference in the cloud.