|
|
|
Export to ONNX |
|
Deploying 🤗 Transformers models in production environments often requires, or can benefit from exporting the models into |
|
a serialized format that can be loaded and executed on specialized runtimes and hardware. |
|
🤗 Optimum is an extension of Transformers that enables exporting models from PyTorch or TensorFlow to serialized formats |
|
such as ONNX and TFLite through its exporters module. 🤗 Optimum also provides a set of performance optimization tools to train |
|
and run models on targeted hardware with maximum efficiency. |
|
This guide demonstrates how you can export 🤗 Transformers models to ONNX with 🤗 Optimum, for the guide on exporting models to TFLite, |
|
please refer to the Export to TFLite page. |
|
Export to ONNX |
|
ONNX (Open Neural Network eXchange) is an open standard that defines a common set of operators and a |
|
common file format to represent deep learning models in a wide variety of frameworks, including PyTorch and |
|
TensorFlow. When a model is exported to the ONNX format, these operators are used to |
|
construct a computational graph (often called an intermediate representation) which |
|
represents the flow of data through the neural network. |
|
By exposing a graph with standardized operators and data types, ONNX makes it easy to |
|
switch between frameworks. For example, a model trained in PyTorch can be exported to |
|
ONNX format and then imported in TensorFlow (and vice versa). |
|
Once exported to ONNX format, a model can be: |
|
- optimized for inference via techniques such as graph optimization and quantization. |
|
- run with ONNX Runtime via ORTModelForXXX classes, |
|
which follow the same AutoModel API as the one you are used to in 🤗 Transformers. |
|
- run with optimized inference pipelines, |
|
which has the same API as the [pipeline] function in 🤗 Transformers. |
|
🤗 Optimum provides support for the ONNX export by leveraging configuration objects. These configuration objects come |
|
ready-made for a number of model architectures, and are designed to be easily extendable to other architectures. |
|
For the list of ready-made configurations, please refer to 🤗 Optimum documentation. |
|
There are two ways to export a 🤗 Transformers model to ONNX, here we show both: |
|
|
|
export with 🤗 Optimum via CLI. |
|
export with 🤗 Optimum with optimum.onnxruntime. |
|
|
|
Exporting a 🤗 Transformers model to ONNX with CLI |
|
To export a 🤗 Transformers model to ONNX, first install an extra dependency: |
|
|
|
pip install optimum[exporters] |
|
To check out all available arguments, refer to the 🤗 Optimum docs, |
|
or view help in command line: |
|
|
|
optimum-cli export onnx --help |
|
To export a model's checkpoint from the 🤗 Hub, for example, distilbert/distilbert-base-uncased-distilled-squad, run the following command: |
|
|
|
optimum-cli export onnx --model distilbert/distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/ |
|
You should see the logs indicating progress and showing where the resulting model.onnx is saved, like this: |
|
|
|
Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx |
|
-[✓] ONNX model output names match reference model (start_logits, end_logits) |
|
- Validating ONNX Model output "start_logits": |
|
-[✓] (2, 16) matches (2, 16) |
|
-[✓] all values close (atol: 0.0001) |
|
- Validating ONNX Model output "end_logits": |
|
-[✓] (2, 16) matches (2, 16) |
|
-[✓] all values close (atol: 0.0001) |
|
The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx |
|
The example above illustrates exporting a checkpoint from 🤗 Hub. When exporting a local model, first make sure that you |
|
saved both the model's weights and tokenizer files in the same directory (local_path). When using CLI, pass the |
|
local_path to the model argument instead of the checkpoint name on 🤗 Hub and provide the --task argument. |
|
You can review the list of supported tasks in the 🤗 Optimum documentation. |
|
If task argument is not provided, it will default to the model architecture without any task specific head. |
|
|
|
optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/ |
|
The resulting model.onnx file can then be run on one of the many |
|
accelerators that support the ONNX |
|
standard. For example, we can load and run the model with ONNX |
|
Runtime as follows: |
|
thon |
|
|
|
from transformers import AutoTokenizer |
|
from optimum.onnxruntime import ORTModelForQuestionAnswering |
|
tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx") |
|
model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx") |
|
inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt") |
|
outputs = model(**inputs) |
|
|
|
The process is identical for TensorFlow checkpoints on the Hub. For instance, here's how you would |
|
export a pure TensorFlow checkpoint from the Keras organization: |
|
|
|
optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/ |
|
Exporting a 🤗 Transformers model to ONNX with optimum.onnxruntime |
|
Alternative to CLI, you can export a 🤗 Transformers model to ONNX programmatically like so: |
|
thon |
|
|
|
from optimum.onnxruntime import ORTModelForSequenceClassification |
|
from transformers import AutoTokenizer |
|
model_checkpoint = "distilbert_base_uncased_squad" |
|
save_directory = "onnx/" |
|
Load a model from transformers and export it to ONNX |
|
ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True) |
|
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint) |
|
Save the onnx model and tokenizer |
|
ort_model.save_pretrained(save_directory) |
|
tokenizer.save_pretrained(save_directory) |
|
|
|
Exporting a model for an unsupported architecture |
|
If you wish to contribute by adding support for a model that cannot be currently exported, you should first check if it is |
|
supported in optimum.exporters.onnx, |
|
and if it is not, contribute to 🤗 Optimum |
|
directly. |
|
Exporting a model with transformers.onnx |
|
|
|
tranformers.onnx is no longer maintained, please export models with 🤗 Optimum as described above. This section will be removed in the future versions. |
|
|
|
To export a 🤗 Transformers model to ONNX with tranformers.onnx, install extra dependencies: |
|
|
|
pip install transformers[onnx] |
|
Use transformers.onnx package as a Python module to export a checkpoint using a ready-made configuration: |
|
|
|
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/ |
|
This exports an ONNX graph of the checkpoint defined by the --model argument. Pass any checkpoint on the 🤗 Hub or one that's stored locally. |
|
The resulting model.onnx file can then be run on one of the many accelerators that support the ONNX standard. For example, |
|
load and run the model with ONNX Runtime as follows: |
|
thon |
|
|
|
from transformers import AutoTokenizer |
|
from onnxruntime import InferenceSession |
|
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased") |
|
session = InferenceSession("onnx/model.onnx") |
|
ONNX Runtime expects NumPy arrays as input |
|
inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np") |
|
outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs)) |
|
|
|
The required output names (like ["last_hidden_state"]) can be obtained by taking a look at the ONNX configuration of |
|
each model. For example, for DistilBERT we have: |
|
thon |
|
|
|
from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig |
|
config = DistilBertConfig() |
|
onnx_config = DistilBertOnnxConfig(config) |
|
print(list(onnx_config.outputs.keys())) |
|
["last_hidden_state"] |
|
|
|
The process is identical for TensorFlow checkpoints on the Hub. For example, export a pure TensorFlow checkpoint like so: |
|
|
|
python -m transformers.onnx --model=keras-io/transformers-qa onnx/ |
|
To export a model that's stored locally, save the model's weights and tokenizer files in the same directory (e.g. local-pt-checkpoint), |
|
then export it to ONNX by pointing the --model argument of the transformers.onnx package to the desired directory: |
|
|
|
python -m transformers.onnx --model=local-pt-checkpoint onnx/ |