issue about the finetuning of this model

#8
by nassairgnigni - opened

Hi , i want to finetune this model to adapt it with my data . I have respected all the instruction present in the official documentation .When want to add the .nemo pretrained model , it does not work .
Here is the the command (A little bit modified )

!python /content/NeMo/examples/speaker_tasks/diarization/neural_diarizer/sortformer_diar_train.py
--config-path='../conf/neural_diarizer'
--config-name='sortformer_diarizer_hybrid_loss_4spk-v1.yaml'
trainer.devices=1
model.train_ds.manifest_filepath='/content/gdrive/MyDrive//train_manifest.json'
model.validation_ds.manifest_filepath='/content/gdrive/MyDrive/
/test_manifest.json'
exp_manager.name='sample_train_finetune'
exp_manager.exp_dir='./sortformer_diar_train_finetune'
exp_manager.resume_from_checkpoint="/content/gdrive/MyDrive/***/diar_sortformer_4spk-v1.nemo"
#exp_manager.resume_if_exists=True # Peut être utile si vous relancez un entraînement interrompu

here is the error i get ...........................

Traceback (most recent call last):
File "/content/NeMo/examples/speaker_tasks/diarization/neural_diarizer/sortformer_diar_train.py", line 58, in main
trainer.fit(sortformer_model)
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/trainer/trainer.py", line 950, in _run
self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 397, in _restore_modules_and_callbacks
self.resume_start(checkpoint_path)
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 79, in resume_start
loaded_checkpoint = self.trainer.strategy.load_checkpoint(checkpoint_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/lightning/pytorch/strategies/strategy.py", line 367, in load_checkpoint
return self.checkpoint_io.load_checkpoint(checkpoint_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/lightning/fabric/plugins/io/torch_io.py", line 83, in load_checkpoint
return pl_load(path, map_location=map_location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/lightning/fabric/utilities/cloud_io.py", line 60, in _load
return torch.load(
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1495, in load
return _legacy_load(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1727, in _legacy_load
return legacy_load(f)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1601, in legacy_load
tar.extract("storages", path=tmpdir)
File "/usr/lib/python3.11/tarfile.py", line 2325, in extract
tarinfo = self._get_extract_tarinfo(member, filter_function, path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/tarfile.py", line 2332, in _get_extract_tarinfo
tarinfo = self.getmember(member)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/tarfile.py", line 2015, in getmember
raise KeyError("filename %r not found" % name)
KeyError: "filename 'storages' not found"

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

filename 'storages'not found .... where i'm suppose to get it from ?

can you send me please a simple notebook for us to finetune easyly this model on our own data ?
my email : foupouagnignimowoum@gmail.com

Thanks

NVIDIA org

Hi @nassairgnigni , sorry for late reply.
Please try running with
+init_from_nemo_model.model.path="/path/to/diar_sortformer_4spk-v1.nemo"
instead of
exp_manager.resume_from_checkpoint="/path/to/diar_sortformer_4spk-v1.nemo".

Sign up or log in to comment