from transformers.integrations import HfDeepSpeedConfig
from transformers import AutoModel, AutoConfig
import deepspeed
ds_config = {}  # deepspeed config object or path to the file
must run before instantiating the model to detect zero 3
dschf = HfDeepSpeedConfig(ds_config)  # keep this object alive
config = AutoConfig.from_pretrained("openai-community/gpt2")
model = AutoModel.from_config(config)
engine = deepspeed.initialize(model=model, config_params=ds_config, )

Non-Trainer ZeRO Inference
To run ZeRO Inference without the [Trainer] in cases where you can’t fit a model onto a single GPU, try using additional GPUs or/and offloading to CPU memory.