model = AutoModelForSeq2SeqLM.from_pretrained(model_name) initialise Deepspeed ZeRO and store only the engine object ds_engine = deepspeed.initialize(model=model, config_params=ds_config)[0] ds_engine.module.eval() # inference Deepspeed ZeRO can process unrelated inputs on each GPU.