from transformers.integrations import HfDeepSpeedConfig from transformers import AutoModel, AutoConfig import deepspeed ds_config = {} # deepspeed config object or path to the file must run before instantiating the model to detect zero 3 dschf = HfDeepSpeedConfig(ds_config) # keep this object alive config = AutoConfig.from_pretrained("openai-community/gpt2") model = AutoModel.from_config(config) engine = deepspeed.initialize(model=model, config_params=ds_config, ) Non-Trainer ZeRO Inference To run ZeRO Inference without the [Trainer] in cases where you can’t fit a model onto a single GPU, try using additional GPUs or/and offloading to CPU memory.