it has to be run before loading the model AutoModelForSeq2SeqLM.from_pretrained(model_name) | |
otherwise the model will first be loaded normally and only partitioned at forward time which is | |
less efficient and when there is little CPU RAM may fail | |
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive | |
now a model can be loaded. |