alibabasglab commited on
Commit
b817bce
·
verified ·
1 Parent(s): 4a76c86

Upload 6 files

Browse files
checkpoints/.DS_Store ADDED
Binary file (6.15 kB). View file
 
checkpoints/log_VoxCeleb2_lip_dprnn_2spk/config.yaml ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Config file
2
+
3
+ # Log
4
+ seed: 777
5
+ use_cuda: 1 # 1 for True, 0 for False
6
+
7
+ # dataset
8
+ speaker_no: 2
9
+ mix_lst_path: ./data/VoxCeleb2/mixture_data_list_2mix.csv
10
+ audio_direc: /mnt/nas_sg/wulanchabu/zexu.pan/datasets/VoxCeleb2/audio_clean/
11
+ reference_direc: /mnt/nas_sg/wulanchabu/zexu.pan/datasets/VoxCeleb2/orig/
12
+ audio_sr: 16000
13
+ ref_sr: 25
14
+
15
+ # dataloader
16
+ num_workers: 4
17
+ batch_size: 8 # 2-GPU training with a total effective batch size of 16
18
+ accu_grad: 0
19
+ effec_batch_size: 4 # per GPU, only used if accu_grad is set to 1, must be multiple times of batch size
20
+ max_length: 6 # truncate the utterances in dataloader, in seconds
21
+
22
+ # network settings
23
+ init_from: None # 'None' or a log name 'log_2024-07-22(18:12:13)'
24
+ causal: 0 # 1 for True, 0 for False
25
+ network_reference:
26
+ cue: lip # lip or speech or gesture or EEG
27
+ backbone: resnet18 # resnet18 or shufflenetV2 or blazenet64
28
+ emb_size: 256 # resnet18:256
29
+ network_audio:
30
+ backbone: av_dprnn
31
+ N: 256
32
+ L: 40
33
+ B: 64
34
+ H: 128
35
+ K: 100
36
+ R: 6
37
+
38
+ # optimizer
39
+ loss_type: sisdr # "snr", "sisdr", "hybrid"
40
+ init_learning_rate: 0.001
41
+ max_epoch: 150
42
+ clip_grad_norm: 5
checkpoints/log_VoxCeleb2_lip_dprnn_2spk/last_best_checkpoint.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98e529110aada7576f4ac360ee1d4c338dc63fdd805d52716129754f26b4f1b4
3
+ size 94590482
checkpoints/log_VoxCeleb2_lip_dprnn_2spk/last_checkpoint.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4732a58efa78cc0aca08dcb9ad9ba121dfb32b9bb8886575faea19ece92afda
3
+ size 94585962
checkpoints/log_VoxCeleb2_lip_dprnn_2spk/log_2024-10-16(15:27:35).txt ADDED
@@ -0,0 +1,611 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Config file
2
+
3
+ # Log
4
+ seed: 777
5
+ use_cuda: 1 # 1 for True, 0 for False
6
+
7
+ # dataset
8
+ speaker_no: 2
9
+ mix_lst_path: ./data/VoxCeleb2/mixture_data_list_2mix.csv
10
+ audio_direc: /mnt/nas_sg/wulanchabu/zexu.pan/datasets/VoxCeleb2/audio_clean/
11
+ reference_direc: /mnt/nas_sg/wulanchabu/zexu.pan/datasets/VoxCeleb2/orig/
12
+ audio_sr: 16000
13
+ ref_sr: 25
14
+
15
+ # dataloader
16
+ num_workers: 4
17
+ batch_size: 8 # 2-GPU training with a total effective batch size of 16
18
+ accu_grad: 0
19
+ effec_batch_size: 4 # per GPU, only used if accu_grad is set to 1, must be multiple times of batch size
20
+ max_length: 6 # truncate the utterances in dataloader, in seconds
21
+
22
+ # network settings
23
+ init_from: None # 'None' or a log name 'log_2024-07-22(18:12:13)'
24
+ causal: 0 # 1 for True, 0 for False
25
+ network_reference:
26
+ cue: lip # lip or speech or gesture or EEG
27
+ backbone: resnet18 # resnet18 or shufflenetV2 or blazenet64
28
+ emb_size: 256 # resnet18:256
29
+ network_audio:
30
+ backbone: dprnn
31
+ N: 256
32
+ L: 40
33
+ B: 64
34
+ H: 128
35
+ K: 100
36
+ R: 6
37
+
38
+ # optimizer
39
+ loss_type: sisdr # "snr", "sisdr", "hybrid"
40
+ init_learning_rate: 0.001
41
+ max_epoch: 150
42
+ clip_grad_norm: 5
43
+ W1016 15:28:05.450172 140099065915200 torch/distributed/run.py:779]
44
+ W1016 15:28:05.450172 140099065915200 torch/distributed/run.py:779] *****************************************
45
+ W1016 15:28:05.450172 140099065915200 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
46
+ W1016 15:28:05.450172 140099065915200 torch/distributed/run.py:779] *****************************************
47
+ started on checkpoints/log_2024-10-16(15:27:34)
48
+
49
+ namespace(accu_grad=0, audio_direc='/mnt/nas_sg/wulanchabu/zexu.pan/datasets/VoxCeleb2/audio_clean/', audio_sr=16000, batch_size=8, causal=0, checkpoint_dir='checkpoints/log_2024-10-16(15:27:34)', clip_grad_norm=5.0, config=[<yamlargparse.Path object at 0x7f335406e640>], device=device(type='cuda'), distributed=True, effec_batch_size=4, init_from='None', init_learning_rate=0.001, local_rank=0, loss_type='sisdr', lr_warmup=0, max_epoch=150, max_length=6, mix_lst_path='./data/VoxCeleb2/mixture_data_list_2mix.csv', network_audio=namespace(B=64, H=128, K=100, L=40, N=256, R=6, backbone='dprnn'), network_reference=namespace(backbone='resnet18', cue='lip', emb_size=256), num_workers=4, ref_sr=25, reference_direc='/mnt/nas_sg/wulanchabu/zexu.pan/datasets/VoxCeleb2/orig/', seed=777, speaker_no=2, train_from_last_checkpoint=0, use_cuda=1, world_size=2)
50
+ network_wrapper(
51
+ (sep_network): Dprnn(
52
+ (encoder): Encoder(
53
+ (conv1d_U): Conv1d(1, 256, kernel_size=(40,), stride=(20,), bias=False)
54
+ )
55
+ (separator): rnn(
56
+ (layer_norm): GroupNorm(1, 256, eps=1e-08, affine=True)
57
+ (bottleneck_conv1x1): Conv1d(256, 64, kernel_size=(1,), stride=(1,), bias=False)
58
+ (dual_rnn): ModuleList(
59
+ (0-5): 6 x Dual_RNN_Block(
60
+ (intra_rnn): LSTM(64, 128, batch_first=True, bidirectional=True)
61
+ (inter_rnn): LSTM(64, 128, batch_first=True, bidirectional=True)
62
+ (intra_norm): GroupNorm(1, 64, eps=1e-08, affine=True)
63
+ (inter_norm): GroupNorm(1, 64, eps=1e-08, affine=True)
64
+ (intra_linear): Linear(in_features=256, out_features=64, bias=True)
65
+ (inter_linear): Linear(in_features=256, out_features=64, bias=True)
66
+ )
67
+ )
68
+ (prelu): PReLU(num_parameters=1)
69
+ (mask_conv1x1): Conv1d(64, 256, kernel_size=(1,), stride=(1,), bias=False)
70
+ (av_conv): Conv1d(320, 64, kernel_size=(1,), stride=(1,), bias=False)
71
+ )
72
+ (decoder): Decoder(
73
+ (basis_signals): Linear(in_features=256, out_features=40, bias=False)
74
+ )
75
+ )
76
+ (ref_encoder): Visual_encoder(
77
+ (v_frontend): VisualFrontend(
78
+ (frontend3D): Sequential(
79
+ (0): Conv3d(1, 64, kernel_size=(5, 7, 7), stride=(1, 2, 2), padding=(2, 3, 3), bias=False)
80
+ (1): SyncBatchNorm(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
81
+ (2): ReLU()
82
+ (3): MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), dilation=1, ceil_mode=False)
83
+ )
84
+ (resnet): ResNet(
85
+ (layer1): ResNetLayer(
86
+ (conv1a): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
87
+ (bn1a): SyncBatchNorm(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
88
+ (conv2a): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
89
+ (downsample): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
90
+ (outbna): SyncBatchNorm(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
91
+ (conv1b): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
92
+ (bn1b): SyncBatchNorm(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
93
+ (conv2b): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
94
+ (outbnb): SyncBatchNorm(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
95
+ )
96
+ (layer2): ResNetLayer(
97
+ (conv1a): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
98
+ (bn1a): SyncBatchNorm(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
99
+ (conv2a): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
100
+ (downsample): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
101
+ (outbna): SyncBatchNorm(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
102
+ (conv1b): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
103
+ (bn1b): SyncBatchNorm(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
104
+ (conv2b): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
105
+ (outbnb): SyncBatchNorm(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
106
+ )
107
+ (layer3): ResNetLayer(
108
+ (conv1a): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
109
+ (bn1a): SyncBatchNorm(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
110
+ (conv2a): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
111
+ (downsample): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
112
+ (outbna): SyncBatchNorm(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
113
+ (conv1b): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
114
+ (bn1b): SyncBatchNorm(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
115
+ (conv2b): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
116
+ (outbnb): SyncBatchNorm(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
117
+ )
118
+ (layer4): ResNetLayer(
119
+ (conv1a): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
120
+ (bn1a): SyncBatchNorm(512, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
121
+ (conv2a): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
122
+ (downsample): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
123
+ (outbna): SyncBatchNorm(512, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
124
+ (conv1b): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
125
+ (bn1b): SyncBatchNorm(512, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
126
+ (conv2b): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
127
+ (outbnb): SyncBatchNorm(512, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
128
+ )
129
+ (avgpool): AvgPool2d(kernel_size=(4, 4), stride=(1, 1), padding=0)
130
+ )
131
+ )
132
+ (v_ds): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
133
+ (visual_conv): Sequential(
134
+ (0): VisualConv1D(
135
+ (relu_0): ReLU()
136
+ (norm_0): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
137
+ (conv1x1): Conv1d(256, 512, kernel_size=(1,), stride=(1,), bias=False)
138
+ (relu): ReLU()
139
+ (norm_1): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
140
+ (dsconv): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,), groups=512)
141
+ (prelu): PReLU(num_parameters=1)
142
+ (norm_2): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
143
+ (pw_conv): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
144
+ )
145
+ (1): VisualConv1D(
146
+ (relu_0): ReLU()
147
+ (norm_0): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
148
+ (conv1x1): Conv1d(256, 512, kernel_size=(1,), stride=(1,), bias=False)
149
+ (relu): ReLU()
150
+ (norm_1): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
151
+ (dsconv): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,), groups=512)
152
+ (prelu): PReLU(num_parameters=1)
153
+ (norm_2): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
154
+ (pw_conv): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
155
+ )
156
+ (2): VisualConv1D(
157
+ (relu_0): ReLU()
158
+ (norm_0): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
159
+ (conv1x1): Conv1d(256, 512, kernel_size=(1,), stride=(1,), bias=False)
160
+ (relu): ReLU()
161
+ (norm_1): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
162
+ (dsconv): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,), groups=512)
163
+ (prelu): PReLU(num_parameters=1)
164
+ (norm_2): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
165
+ (pw_conv): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
166
+ )
167
+ (3): VisualConv1D(
168
+ (relu_0): ReLU()
169
+ (norm_0): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
170
+ (conv1x1): Conv1d(256, 512, kernel_size=(1,), stride=(1,), bias=False)
171
+ (relu): ReLU()
172
+ (norm_1): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
173
+ (dsconv): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,), groups=512)
174
+ (prelu): PReLU(num_parameters=1)
175
+ (norm_2): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
176
+ (pw_conv): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
177
+ )
178
+ (4): VisualConv1D(
179
+ (relu_0): ReLU()
180
+ (norm_0): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
181
+ (conv1x1): Conv1d(256, 512, kernel_size=(1,), stride=(1,), bias=False)
182
+ (relu): ReLU()
183
+ (norm_1): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
184
+ (dsconv): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,), groups=512)
185
+ (prelu): PReLU(num_parameters=1)
186
+ (norm_2): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
187
+ (pw_conv): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
188
+ )
189
+ )
190
+ )
191
+ )
192
+
193
+ Total number of parameters: 15306950
194
+
195
+
196
+ Total number of trainable parameters: 4121862
197
+
198
+ dlc1xpmyvbppmvru-master-0:29:29 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth
199
+ dlc1xpmyvbppmvru-master-0:29:29 [0] NCCL INFO Bootstrap : Using eth0:22.3.234.0<0>
200
+ dlc1xpmyvbppmvru-master-0:29:29 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so
201
+ dlc1xpmyvbppmvru-master-0:29:29 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation
202
+ dlc1xpmyvbppmvru-master-0:29:29 [0] NCCL INFO cudaDriverVersion 11040
203
+ dlc1xpmyvbppmvru-master-0:30:30 [1] NCCL INFO cudaDriverVersion 11040
204
+ NCCL version 2.20.5+cuda11.8
205
+ dlc1xpmyvbppmvru-master-0:30:30 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth
206
+ dlc1xpmyvbppmvru-master-0:30:30 [1] NCCL INFO Bootstrap : Using eth0:22.3.234.0<0>
207
+ dlc1xpmyvbppmvru-master-0:30:30 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so
208
+ dlc1xpmyvbppmvru-master-0:30:30 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation
209
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth
210
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth
211
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO NCCL_IB_HCA set to mlx5
212
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO NCCL_IB_HCA set to mlx5
213
+ libibverbs: Warning: couldn't load driver 'libhfi1verbs-rdmav25.so': libhfi1verbs-rdmav25.so: cannot open shared object file: No such file or directory
214
+ libibverbs: Warning: couldn't load driver 'libhfi1verbs-rdmav25.so': libhfi1verbs-rdmav25.so: cannot open shared object file: No such file or directory
215
+ libibverbs: Warning: couldn't load driver 'librxe-rdmav25.so': librxe-rdmav25.so: cannot open shared object file: No such file or directory
216
+ libibverbs: Warning: couldn't load driver 'librxe-rdmav25.so': librxe-rdmav25.so: cannot open shared object file: No such file or directory
217
+ libibverbs: Warning: couldn't load driver 'libmthca-rdmav25.so': libmthca-rdmav25.so: cannot open shared object file: No such file or directory
218
+ libibverbs: Warning: couldn't load driver 'libmthca-rdmav25.so': libmthca-rdmav25.so: cannot open shared object file: No such file or directory
219
+ libibverbs: Warning: couldn't load driver 'libvmw_pvrdma-rdmav25.so': libvmw_pvrdma-rdmav25.so: cannot open shared object file: No such file or directory
220
+ libibverbs: Warning: couldn't load driver 'libvmw_pvrdma-rdmav25.so': libvmw_pvrdma-rdmav25.so: cannot open shared object file: No such file or directory
221
+ libibverbs: Warning: couldn't load driver 'libhns-rdmav25.so': libhns-rdmav25.so: cannot open shared object file: No such file or directory
222
+ libibverbs: Warning: couldn't load driver 'libhns-rdmav25.so': libhns-rdmav25.so: cannot open shared object file: No such file or directory
223
+ libibverbs: Warning: couldn't load driver 'libipathverbs-rdmav25.so': libipathverbs-rdmav25.so: cannot open shared object file: No such file or directory
224
+ libibverbs: Warning: couldn't load driver 'libipathverbs-rdmav25.so': libipathverbs-rdmav25.so: cannot open shared object file: No such file or directory
225
+ libibverbs: Warning: couldn't load driver 'libsiw-rdmav25.so': libsiw-rdmav25.so: cannot open shared object file: No such file or directory
226
+ libibverbs: Warning: couldn't load driver 'libsiw-rdmav25.so': libsiw-rdmav25.so: cannot open shared object file: No such file or directory
227
+ libibverbs: Warning: couldn't load driver 'libbnxt_re-rdmav25.so': libbnxt_re-rdmav25.so: cannot open shared object file: No such file or directory
228
+ libibverbs: Warning: couldn't load driver 'libbnxt_re-rdmav25.so': libbnxt_re-rdmav25.so: cannot open shared object file: No such file or directory
229
+ libibverbs: Warning: couldn't load driver 'libocrdma-rdmav25.so': libocrdma-rdmav25.so: cannot open shared object file: No such file or directory
230
+ libibverbs: Warning: couldn't load driver 'libocrdma-rdmav25.so': libocrdma-rdmav25.so: cannot open shared object file: No such file or directory
231
+ libibverbs: Warning: couldn't load driver 'libmlx4-rdmav25.so': libmlx4-rdmav25.so: cannot open shared object file: No such file or directory
232
+ libibverbs: Warning: couldn't load driver 'libmlx4-rdmav25.so': libmlx4-rdmav25.so: cannot open shared object file: No such file or directory
233
+ libibverbs: Warning: couldn't load driver 'libqedr-rdmav25.so': libqedr-rdmav25.so: cannot open shared object file: No such file or directory
234
+ libibverbs: Warning: couldn't load driver 'libqedr-rdmav25.so': libqedr-rdmav25.so: cannot open shared object file: No such file or directory
235
+ libibverbs: Warning: couldn't load driver 'libcxgb4-rdmav25.so': libcxgb4-rdmav25.so: cannot open shared object file: No such file or directory
236
+ libibverbs: Warning: couldn't load driver 'libcxgb4-rdmav25.so': libcxgb4-rdmav25.so: cannot open shared object file: No such file or directory
237
+ libibverbs: Warning: couldn't load driver 'libi40iw-rdmav25.so': libi40iw-rdmav25.so: cannot open shared object file: No such file or directory
238
+ libibverbs: Warning: couldn't load driver 'libi40iw-rdmav25.so': libi40iw-rdmav25.so: cannot open shared object file: No such file or directory
239
+ libibverbs: Warning: couldn't load driver 'libefa-rdmav25.so': libefa-rdmav25.so: cannot open shared object file: No such file or directory
240
+ libibverbs: Warning: couldn't load driver 'libefa-rdmav25.so': libefa-rdmav25.so: cannot open shared object file: No such file or directory
241
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [RO]; OOB eth0:22.3.234.0<0>
242
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Using non-device net plugin version 0
243
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Using network IB
244
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [RO]; OOB eth0:22.3.234.0<0>
245
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Using non-device net plugin version 0
246
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Using network IB
247
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO comm 0x9335780 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 10 commId 0x910afc8f76702aa2 - Init START
248
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO comm 0x8a4edc0 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 20 commId 0x910afc8f76702aa2 - Init START
249
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
250
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
251
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO comm 0x9335780 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
252
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO comm 0x8a4edc0 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
253
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4.
254
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4.
255
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 00/04 : 0 1
256
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 01/04 : 0 1
257
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 02/04 : 0 1
258
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 [2] -1/-1/-1->1->0 [3] 0/-1/-1->1->-1
259
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 03/04 : 0 1
260
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO P2P Chunksize set to 524288
261
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] -1/-1/-1->0->1 [2] 1/-1/-1->0->-1 [3] -1/-1/-1->0->1
262
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO P2P Chunksize set to 524288
263
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read
264
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read
265
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read
266
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read
267
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read
268
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read
269
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read
270
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read
271
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Connected all rings
272
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO Connected all trees
273
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Connected all rings
274
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO Connected all trees
275
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
276
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 4 p2p channels per peer
277
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
278
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 4 p2p channels per peer
279
+ dlc1xpmyvbppmvru-master-0:29:46 [0] NCCL INFO comm 0x9335780 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 10 commId 0x910afc8f76702aa2 - Init COMPLETE
280
+ dlc1xpmyvbppmvru-master-0:30:47 [1] NCCL INFO comm 0x8a4edc0 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 20 commId 0x910afc8f76702aa2 - Init COMPLETE
281
+ Start new training from scratch
282
+ [rank0]:[W1016 15:29:26.567341744 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
283
+ [rank1]:[W1016 15:29:26.567396715 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
284
+ Train Summary | End of Epoch 1 | Time 1136.03s | Train Loss -1.143
285
+ Valid Summary | End of Epoch 1 | Time 186.54s | Valid Loss -1.961
286
+ Test Summary | End of Epoch 1 | Time 89.69s | Test Loss -1.949
287
+ Fund new best model, dict saved
288
+ Train Summary | End of Epoch 2 | Time 532.39s | Train Loss -2.563
289
+ Valid Summary | End of Epoch 2 | Time 62.43s | Valid Loss -3.072
290
+ Test Summary | End of Epoch 2 | Time 36.68s | Test Loss -3.038
291
+ Fund new best model, dict saved
292
+ Train Summary | End of Epoch 3 | Time 522.50s | Train Loss -3.681
293
+ Valid Summary | End of Epoch 3 | Time 61.41s | Valid Loss -3.867
294
+ Test Summary | End of Epoch 3 | Time 34.41s | Test Loss -3.704
295
+ Fund new best model, dict saved
296
+ Train Summary | End of Epoch 4 | Time 521.11s | Train Loss -4.505
297
+ Valid Summary | End of Epoch 4 | Time 60.64s | Valid Loss -4.717
298
+ Test Summary | End of Epoch 4 | Time 34.68s | Test Loss -4.602
299
+ Fund new best model, dict saved
300
+ Train Summary | End of Epoch 5 | Time 520.90s | Train Loss -5.112
301
+ Valid Summary | End of Epoch 5 | Time 60.13s | Valid Loss -5.280
302
+ Test Summary | End of Epoch 5 | Time 34.57s | Test Loss -5.177
303
+ Fund new best model, dict saved
304
+ Train Summary | End of Epoch 6 | Time 527.34s | Train Loss -5.567
305
+ Valid Summary | End of Epoch 6 | Time 60.35s | Valid Loss -5.635
306
+ Test Summary | End of Epoch 6 | Time 35.99s | Test Loss -5.481
307
+ Fund new best model, dict saved
308
+ Train Summary | End of Epoch 7 | Time 524.58s | Train Loss -5.973
309
+ Valid Summary | End of Epoch 7 | Time 64.03s | Valid Loss -6.066
310
+ Test Summary | End of Epoch 7 | Time 35.53s | Test Loss -5.906
311
+ Fund new best model, dict saved
312
+ Train Summary | End of Epoch 8 | Time 521.69s | Train Loss -6.393
313
+ Valid Summary | End of Epoch 8 | Time 60.44s | Valid Loss -6.381
314
+ Test Summary | End of Epoch 8 | Time 34.78s | Test Loss -6.205
315
+ Fund new best model, dict saved
316
+ Train Summary | End of Epoch 9 | Time 524.26s | Train Loss -6.815
317
+ Valid Summary | End of Epoch 9 | Time 60.49s | Valid Loss -6.840
318
+ Test Summary | End of Epoch 9 | Time 34.27s | Test Loss -6.577
319
+ Fund new best model, dict saved
320
+ Train Summary | End of Epoch 10 | Time 527.01s | Train Loss -7.251
321
+ Valid Summary | End of Epoch 10 | Time 60.70s | Valid Loss -7.199
322
+ Test Summary | End of Epoch 10 | Time 34.01s | Test Loss -6.979
323
+ Fund new best model, dict saved
324
+ Train Summary | End of Epoch 11 | Time 519.57s | Train Loss -7.656
325
+ Valid Summary | End of Epoch 11 | Time 59.87s | Valid Loss -7.626
326
+ Test Summary | End of Epoch 11 | Time 34.53s | Test Loss -7.344
327
+ Fund new best model, dict saved
328
+ Train Summary | End of Epoch 12 | Time 518.91s | Train Loss -8.007
329
+ Valid Summary | End of Epoch 12 | Time 59.72s | Valid Loss -7.927
330
+ Test Summary | End of Epoch 12 | Time 34.57s | Test Loss -7.680
331
+ Fund new best model, dict saved
332
+ Train Summary | End of Epoch 13 | Time 519.78s | Train Loss -8.310
333
+ Valid Summary | End of Epoch 13 | Time 59.64s | Valid Loss -8.179
334
+ Test Summary | End of Epoch 13 | Time 34.45s | Test Loss -7.865
335
+ Fund new best model, dict saved
336
+ Train Summary | End of Epoch 14 | Time 518.22s | Train Loss -8.636
337
+ Valid Summary | End of Epoch 14 | Time 60.08s | Valid Loss -8.400
338
+ Test Summary | End of Epoch 14 | Time 34.13s | Test Loss -8.087
339
+ Fund new best model, dict saved
340
+ Train Summary | End of Epoch 15 | Time 519.43s | Train Loss -8.894
341
+ Valid Summary | End of Epoch 15 | Time 60.15s | Valid Loss -8.782
342
+ Test Summary | End of Epoch 15 | Time 34.21s | Test Loss -8.444
343
+ Fund new best model, dict saved
344
+ Train Summary | End of Epoch 16 | Time 519.03s | Train Loss -9.191
345
+ Valid Summary | End of Epoch 16 | Time 61.51s | Valid Loss -8.883
346
+ Test Summary | End of Epoch 16 | Time 33.95s | Test Loss -8.592
347
+ Fund new best model, dict saved
348
+ Train Summary | End of Epoch 17 | Time 518.84s | Train Loss -9.422
349
+ Valid Summary | End of Epoch 17 | Time 59.83s | Valid Loss -9.109
350
+ Test Summary | End of Epoch 17 | Time 34.50s | Test Loss -8.798
351
+ Fund new best model, dict saved
352
+ Train Summary | End of Epoch 18 | Time 518.31s | Train Loss -9.619
353
+ Valid Summary | End of Epoch 18 | Time 59.61s | Valid Loss -9.177
354
+ Test Summary | End of Epoch 18 | Time 34.34s | Test Loss -8.867
355
+ Fund new best model, dict saved
356
+ Train Summary | End of Epoch 19 | Time 517.49s | Train Loss -9.836
357
+ Valid Summary | End of Epoch 19 | Time 60.48s | Valid Loss -9.359
358
+ Test Summary | End of Epoch 19 | Time 34.82s | Test Loss -9.041
359
+ Fund new best model, dict saved
360
+ Train Summary | End of Epoch 20 | Time 520.49s | Train Loss -9.985
361
+ Valid Summary | End of Epoch 20 | Time 60.91s | Valid Loss -9.508
362
+ Test Summary | End of Epoch 20 | Time 34.80s | Test Loss -9.154
363
+ Fund new best model, dict saved
364
+ Train Summary | End of Epoch 21 | Time 521.76s | Train Loss -10.156
365
+ Valid Summary | End of Epoch 21 | Time 61.59s | Valid Loss -9.648
366
+ Test Summary | End of Epoch 21 | Time 35.30s | Test Loss -9.361
367
+ Fund new best model, dict saved
368
+ Train Summary | End of Epoch 22 | Time 520.25s | Train Loss -10.321
369
+ Valid Summary | End of Epoch 22 | Time 60.19s | Valid Loss -9.891
370
+ Test Summary | End of Epoch 22 | Time 34.65s | Test Loss -9.584
371
+ Fund new best model, dict saved
372
+ Train Summary | End of Epoch 23 | Time 518.41s | Train Loss -10.437
373
+ Valid Summary | End of Epoch 23 | Time 60.06s | Valid Loss -9.806
374
+ Test Summary | End of Epoch 23 | Time 34.87s | Test Loss -9.465
375
+ Train Summary | End of Epoch 24 | Time 518.00s | Train Loss -10.558
376
+ Valid Summary | End of Epoch 24 | Time 60.04s | Valid Loss -10.033
377
+ Test Summary | End of Epoch 24 | Time 34.30s | Test Loss -9.654
378
+ Fund new best model, dict saved
379
+ Train Summary | End of Epoch 25 | Time 518.19s | Train Loss -10.674
380
+ Valid Summary | End of Epoch 25 | Time 60.68s | Valid Loss -10.149
381
+ Test Summary | End of Epoch 25 | Time 34.28s | Test Loss -9.742
382
+ Fund new best model, dict saved
383
+ Train Summary | End of Epoch 26 | Time 517.83s | Train Loss -10.793
384
+ Valid Summary | End of Epoch 26 | Time 59.94s | Valid Loss -10.301
385
+ Test Summary | End of Epoch 26 | Time 34.41s | Test Loss -9.923
386
+ Fund new best model, dict saved
387
+ Train Summary | End of Epoch 27 | Time 519.54s | Train Loss -10.893
388
+ Valid Summary | End of Epoch 27 | Time 59.71s | Valid Loss -10.337
389
+ Test Summary | End of Epoch 27 | Time 34.16s | Test Loss -9.964
390
+ Fund new best model, dict saved
391
+ Train Summary | End of Epoch 28 | Time 517.22s | Train Loss -10.998
392
+ Valid Summary | End of Epoch 28 | Time 60.20s | Valid Loss -10.414
393
+ Test Summary | End of Epoch 28 | Time 35.08s | Test Loss -9.978
394
+ Fund new best model, dict saved
395
+ Train Summary | End of Epoch 29 | Time 519.76s | Train Loss -11.092
396
+ Valid Summary | End of Epoch 29 | Time 59.35s | Valid Loss -10.409
397
+ Test Summary | End of Epoch 29 | Time 34.43s | Test Loss -10.057
398
+ Train Summary | End of Epoch 30 | Time 519.13s | Train Loss -11.168
399
+ Valid Summary | End of Epoch 30 | Time 59.80s | Valid Loss -10.482
400
+ Test Summary | End of Epoch 30 | Time 34.34s | Test Loss -10.156
401
+ Fund new best model, dict saved
402
+ Train Summary | End of Epoch 31 | Time 519.57s | Train Loss -11.234
403
+ Valid Summary | End of Epoch 31 | Time 59.49s | Valid Loss -10.528
404
+ Test Summary | End of Epoch 31 | Time 34.12s | Test Loss -10.135
405
+ Fund new best model, dict saved
406
+ Train Summary | End of Epoch 32 | Time 519.21s | Train Loss -11.322
407
+ Valid Summary | End of Epoch 32 | Time 60.14s | Valid Loss -10.618
408
+ Test Summary | End of Epoch 32 | Time 34.59s | Test Loss -10.253
409
+ Fund new best model, dict saved
410
+ Train Summary | End of Epoch 33 | Time 518.09s | Train Loss -11.399
411
+ Valid Summary | End of Epoch 33 | Time 59.79s | Valid Loss -10.616
412
+ Test Summary | End of Epoch 33 | Time 34.67s | Test Loss -10.181
413
+ Train Summary | End of Epoch 34 | Time 519.15s | Train Loss -11.461
414
+ Valid Summary | End of Epoch 34 | Time 59.91s | Valid Loss -10.738
415
+ Test Summary | End of Epoch 34 | Time 33.96s | Test Loss -10.300
416
+ Fund new best model, dict saved
417
+ Train Summary | End of Epoch 35 | Time 519.16s | Train Loss -11.516
418
+ Valid Summary | End of Epoch 35 | Time 61.33s | Valid Loss -10.684
419
+ Test Summary | End of Epoch 35 | Time 34.44s | Test Loss -10.158
420
+ Train Summary | End of Epoch 36 | Time 519.31s | Train Loss -11.607
421
+ Valid Summary | End of Epoch 36 | Time 59.58s | Valid Loss -10.844
422
+ Test Summary | End of Epoch 36 | Time 34.49s | Test Loss -10.436
423
+ Fund new best model, dict saved
424
+ Train Summary | End of Epoch 37 | Time 518.54s | Train Loss -11.652
425
+ Valid Summary | End of Epoch 37 | Time 59.47s | Valid Loss -10.818
426
+ Test Summary | End of Epoch 37 | Time 34.26s | Test Loss -10.365
427
+ Train Summary | End of Epoch 38 | Time 518.83s | Train Loss -11.721
428
+ Valid Summary | End of Epoch 38 | Time 61.51s | Valid Loss -10.763
429
+ Test Summary | End of Epoch 38 | Time 34.26s | Test Loss -10.297
430
+ Train Summary | End of Epoch 39 | Time 519.14s | Train Loss -11.780
431
+ Valid Summary | End of Epoch 39 | Time 60.13s | Valid Loss -10.801
432
+ Test Summary | End of Epoch 39 | Time 34.26s | Test Loss -10.421
433
+ Train Summary | End of Epoch 40 | Time 518.87s | Train Loss -11.832
434
+ Valid Summary | End of Epoch 40 | Time 59.66s | Valid Loss -10.950
435
+ Test Summary | End of Epoch 40 | Time 34.57s | Test Loss -10.544
436
+ Fund new best model, dict saved
437
+ Train Summary | End of Epoch 41 | Time 518.29s | Train Loss -11.874
438
+ Valid Summary | End of Epoch 41 | Time 60.20s | Valid Loss -11.077
439
+ Test Summary | End of Epoch 41 | Time 34.59s | Test Loss -10.605
440
+ Fund new best model, dict saved
441
+ Train Summary | End of Epoch 42 | Time 518.75s | Train Loss -11.927
442
+ Valid Summary | End of Epoch 42 | Time 60.51s | Valid Loss -11.021
443
+ Test Summary | End of Epoch 42 | Time 34.21s | Test Loss -10.710
444
+ Train Summary | End of Epoch 43 | Time 518.21s | Train Loss -11.972
445
+ Valid Summary | End of Epoch 43 | Time 60.24s | Valid Loss -11.148
446
+ Test Summary | End of Epoch 43 | Time 34.35s | Test Loss -10.754
447
+ Fund new best model, dict saved
448
+ Train Summary | End of Epoch 44 | Time 517.93s | Train Loss -12.025
449
+ Valid Summary | End of Epoch 44 | Time 59.79s | Valid Loss -11.016
450
+ Test Summary | End of Epoch 44 | Time 35.07s | Test Loss -10.649
451
+ Train Summary | End of Epoch 45 | Time 517.34s | Train Loss -12.058
452
+ Valid Summary | End of Epoch 45 | Time 59.79s | Valid Loss -11.158
453
+ Test Summary | End of Epoch 45 | Time 34.52s | Test Loss -10.659
454
+ Fund new best model, dict saved
455
+ Train Summary | End of Epoch 46 | Time 519.56s | Train Loss -12.102
456
+ Valid Summary | End of Epoch 46 | Time 60.24s | Valid Loss -11.202
457
+ Test Summary | End of Epoch 46 | Time 33.99s | Test Loss -10.825
458
+ Fund new best model, dict saved
459
+ Train Summary | End of Epoch 47 | Time 518.26s | Train Loss -12.128
460
+ Valid Summary | End of Epoch 47 | Time 59.93s | Valid Loss -11.294
461
+ Test Summary | End of Epoch 47 | Time 34.05s | Test Loss -10.846
462
+ Fund new best model, dict saved
463
+ Train Summary | End of Epoch 48 | Time 518.20s | Train Loss -12.189
464
+ Valid Summary | End of Epoch 48 | Time 59.72s | Valid Loss -11.100
465
+ Test Summary | End of Epoch 48 | Time 34.81s | Test Loss -10.742
466
+ Train Summary | End of Epoch 49 | Time 520.08s | Train Loss -12.211
467
+ Valid Summary | End of Epoch 49 | Time 60.13s | Valid Loss -11.270
468
+ Test Summary | End of Epoch 49 | Time 34.61s | Test Loss -10.813
469
+ Train Summary | End of Epoch 50 | Time 518.98s | Train Loss -12.259
470
+ Valid Summary | End of Epoch 50 | Time 60.14s | Valid Loss -11.058
471
+ Test Summary | End of Epoch 50 | Time 34.37s | Test Loss -10.562
472
+ Train Summary | End of Epoch 51 | Time 518.45s | Train Loss -12.284
473
+ Valid Summary | End of Epoch 51 | Time 59.56s | Valid Loss -11.306
474
+ Test Summary | End of Epoch 51 | Time 34.70s | Test Loss -10.906
475
+ Fund new best model, dict saved
476
+ Train Summary | End of Epoch 52 | Time 519.03s | Train Loss -12.322
477
+ Valid Summary | End of Epoch 52 | Time 60.26s | Valid Loss -11.384
478
+ Test Summary | End of Epoch 52 | Time 34.40s | Test Loss -10.926
479
+ Fund new best model, dict saved
480
+ Train Summary | End of Epoch 53 | Time 518.84s | Train Loss -12.363
481
+ Valid Summary | End of Epoch 53 | Time 60.39s | Valid Loss -11.297
482
+ Test Summary | End of Epoch 53 | Time 34.45s | Test Loss -10.880
483
+ Train Summary | End of Epoch 54 | Time 518.32s | Train Loss -12.377
484
+ Valid Summary | End of Epoch 54 | Time 59.67s | Valid Loss -11.256
485
+ Test Summary | End of Epoch 54 | Time 34.42s | Test Loss -10.960
486
+ Train Summary | End of Epoch 55 | Time 518.59s | Train Loss -12.419
487
+ Valid Summary | End of Epoch 55 | Time 59.80s | Valid Loss -10.645
488
+ Test Summary | End of Epoch 55 | Time 34.77s | Test Loss -10.457
489
+ Train Summary | End of Epoch 56 | Time 517.86s | Train Loss -12.440
490
+ Valid Summary | End of Epoch 56 | Time 59.75s | Valid Loss -11.398
491
+ Test Summary | End of Epoch 56 | Time 34.60s | Test Loss -11.014
492
+ Fund new best model, dict saved
493
+ Train Summary | End of Epoch 57 | Time 518.30s | Train Loss -12.484
494
+ Valid Summary | End of Epoch 57 | Time 59.82s | Valid Loss -11.353
495
+ Test Summary | End of Epoch 57 | Time 34.41s | Test Loss -11.011
496
+ Train Summary | End of Epoch 58 | Time 520.20s | Train Loss -12.513
497
+ Valid Summary | End of Epoch 58 | Time 60.28s | Valid Loss -11.232
498
+ Test Summary | End of Epoch 58 | Time 34.11s | Test Loss -10.877
499
+ Train Summary | End of Epoch 59 | Time 518.46s | Train Loss -12.524
500
+ Valid Summary | End of Epoch 59 | Time 59.65s | Valid Loss -11.421
501
+ Test Summary | End of Epoch 59 | Time 34.41s | Test Loss -11.073
502
+ Fund new best model, dict saved
503
+ Train Summary | End of Epoch 60 | Time 518.92s | Train Loss -12.554
504
+ Valid Summary | End of Epoch 60 | Time 59.60s | Valid Loss -11.445
505
+ Test Summary | End of Epoch 60 | Time 34.88s | Test Loss -11.093
506
+ Fund new best model, dict saved
507
+ Train Summary | End of Epoch 61 | Time 517.26s | Train Loss -12.583
508
+ Valid Summary | End of Epoch 61 | Time 59.43s | Valid Loss -11.472
509
+ Test Summary | End of Epoch 61 | Time 35.05s | Test Loss -11.012
510
+ Fund new best model, dict saved
511
+ Train Summary | End of Epoch 62 | Time 517.27s | Train Loss -12.624
512
+ Valid Summary | End of Epoch 62 | Time 59.78s | Valid Loss -11.460
513
+ Test Summary | End of Epoch 62 | Time 34.77s | Test Loss -11.070
514
+ Train Summary | End of Epoch 63 | Time 518.49s | Train Loss -12.637
515
+ Valid Summary | End of Epoch 63 | Time 60.43s | Valid Loss -11.498
516
+ Test Summary | End of Epoch 63 | Time 34.49s | Test Loss -11.066
517
+ Fund new best model, dict saved
518
+ Train Summary | End of Epoch 64 | Time 518.74s | Train Loss -12.660
519
+ Valid Summary | End of Epoch 64 | Time 59.84s | Valid Loss -11.363
520
+ Test Summary | End of Epoch 64 | Time 34.64s | Test Loss -10.865
521
+ Train Summary | End of Epoch 65 | Time 517.86s | Train Loss -12.674
522
+ Valid Summary | End of Epoch 65 | Time 59.65s | Valid Loss -10.896
523
+ Test Summary | End of Epoch 65 | Time 34.13s | Test Loss -10.534
524
+ Train Summary | End of Epoch 66 | Time 517.95s | Train Loss -12.704
525
+ Valid Summary | End of Epoch 66 | Time 59.73s | Valid Loss -11.279
526
+ Test Summary | End of Epoch 66 | Time 34.65s | Test Loss -10.885
527
+ Train Summary | End of Epoch 67 | Time 518.19s | Train Loss -12.721
528
+ Valid Summary | End of Epoch 67 | Time 60.00s | Valid Loss -11.364
529
+ Test Summary | End of Epoch 67 | Time 34.22s | Test Loss -10.883
530
+ Train Summary | End of Epoch 68 | Time 517.83s | Train Loss -12.744
531
+ Valid Summary | End of Epoch 68 | Time 60.40s | Valid Loss -11.619
532
+ Test Summary | End of Epoch 68 | Time 34.24s | Test Loss -11.204
533
+ Fund new best model, dict saved
534
+ Train Summary | End of Epoch 69 | Time 519.78s | Train Loss -12.776
535
+ Valid Summary | End of Epoch 69 | Time 60.07s | Valid Loss -11.411
536
+ Test Summary | End of Epoch 69 | Time 34.55s | Test Loss -10.848
537
+ Train Summary | End of Epoch 70 | Time 518.22s | Train Loss -12.801
538
+ Valid Summary | End of Epoch 70 | Time 59.45s | Valid Loss -11.016
539
+ Test Summary | End of Epoch 70 | Time 34.57s | Test Loss -10.476
540
+ Train Summary | End of Epoch 71 | Time 518.52s | Train Loss -12.804
541
+ Valid Summary | End of Epoch 71 | Time 60.45s | Valid Loss -11.440
542
+ Test Summary | End of Epoch 71 | Time 34.61s | Test Loss -11.051
543
+ Train Summary | End of Epoch 72 | Time 519.23s | Train Loss -12.837
544
+ Valid Summary | End of Epoch 72 | Time 60.35s | Valid Loss -11.442
545
+ Test Summary | End of Epoch 72 | Time 34.71s | Test Loss -10.937
546
+ Train Summary | End of Epoch 73 | Time 520.03s | Train Loss -12.856
547
+ Valid Summary | End of Epoch 73 | Time 59.99s | Valid Loss -11.348
548
+ Test Summary | End of Epoch 73 | Time 34.85s | Test Loss -10.732
549
+ reload weights and optimizer from last best checkpoint
550
+ Learning rate adjusted to: 0.000500
551
+ Train Summary | End of Epoch 74 | Time 518.69s | Train Loss -13.019
552
+ Valid Summary | End of Epoch 74 | Time 60.37s | Valid Loss -11.562
553
+ Test Summary | End of Epoch 74 | Time 34.66s | Test Loss -11.118
554
+ Train Summary | End of Epoch 75 | Time 519.74s | Train Loss -13.069
555
+ Valid Summary | End of Epoch 75 | Time 60.23s | Valid Loss -11.680
556
+ Test Summary | End of Epoch 75 | Time 34.85s | Test Loss -11.168
557
+ Fund new best model, dict saved
558
+ Train Summary | End of Epoch 76 | Time 521.19s | Train Loss -13.105
559
+ Valid Summary | End of Epoch 76 | Time 60.98s | Valid Loss -11.147
560
+ Test Summary | End of Epoch 76 | Time 34.75s | Test Loss -10.639
561
+ Train Summary | End of Epoch 77 | Time 521.28s | Train Loss -13.131
562
+ Valid Summary | End of Epoch 77 | Time 60.25s | Valid Loss -11.306
563
+ Test Summary | End of Epoch 77 | Time 34.47s | Test Loss -10.749
564
+ Train Summary | End of Epoch 78 | Time 521.05s | Train Loss -13.152
565
+ Valid Summary | End of Epoch 78 | Time 59.92s | Valid Loss -11.515
566
+ Test Summary | End of Epoch 78 | Time 34.87s | Test Loss -11.080
567
+ Train Summary | End of Epoch 79 | Time 519.71s | Train Loss -13.175
568
+ Valid Summary | End of Epoch 79 | Time 60.42s | Valid Loss -11.498
569
+ Test Summary | End of Epoch 79 | Time 34.83s | Test Loss -11.020
570
+ Train Summary | End of Epoch 80 | Time 521.39s | Train Loss -13.193
571
+ Valid Summary | End of Epoch 80 | Time 59.97s | Valid Loss -11.811
572
+ Test Summary | End of Epoch 80 | Time 34.58s | Test Loss -11.380
573
+ Fund new best model, dict saved
574
+ Train Summary | End of Epoch 81 | Time 518.44s | Train Loss -13.211
575
+ Valid Summary | End of Epoch 81 | Time 59.77s | Valid Loss -11.735
576
+ Test Summary | End of Epoch 81 | Time 34.51s | Test Loss -11.326
577
+ Train Summary | End of Epoch 82 | Time 520.08s | Train Loss -13.234
578
+ Valid Summary | End of Epoch 82 | Time 60.43s | Valid Loss -11.280
579
+ Test Summary | End of Epoch 82 | Time 34.64s | Test Loss -10.932
580
+ Train Summary | End of Epoch 83 | Time 519.69s | Train Loss -13.246
581
+ Valid Summary | End of Epoch 83 | Time 60.50s | Valid Loss -11.337
582
+ Test Summary | End of Epoch 83 | Time 35.30s | Test Loss -10.917
583
+ Train Summary | End of Epoch 84 | Time 519.87s | Train Loss -13.262
584
+ Valid Summary | End of Epoch 84 | Time 60.09s | Valid Loss -11.391
585
+ Test Summary | End of Epoch 84 | Time 34.72s | Test Loss -11.022
586
+ Train Summary | End of Epoch 85 | Time 521.15s | Train Loss -13.277
587
+ Valid Summary | End of Epoch 85 | Time 60.43s | Valid Loss -11.435
588
+ Test Summary | End of Epoch 85 | Time 34.72s | Test Loss -11.165
589
+ reload weights and optimizer from last best checkpoint
590
+ Learning rate adjusted to: 0.000250
591
+ Train Summary | End of Epoch 86 | Time 518.64s | Train Loss -13.314
592
+ Valid Summary | End of Epoch 86 | Time 59.52s | Valid Loss -11.558
593
+ Test Summary | End of Epoch 86 | Time 34.45s | Test Loss -11.134
594
+ Train Summary | End of Epoch 87 | Time 518.60s | Train Loss -13.337
595
+ Valid Summary | End of Epoch 87 | Time 59.34s | Valid Loss -11.595
596
+ Test Summary | End of Epoch 87 | Time 34.35s | Test Loss -11.278
597
+ Train Summary | End of Epoch 88 | Time 518.53s | Train Loss -13.356
598
+ Valid Summary | End of Epoch 88 | Time 59.97s | Valid Loss -11.571
599
+ Test Summary | End of Epoch 88 | Time 34.59s | Test Loss -11.095
600
+ Train Summary | End of Epoch 89 | Time 518.29s | Train Loss -13.369
601
+ Valid Summary | End of Epoch 89 | Time 59.97s | Valid Loss -11.070
602
+ Test Summary | End of Epoch 89 | Time 33.97s | Test Loss -10.587
603
+ Train Summary | End of Epoch 90 | Time 518.67s | Train Loss -13.385
604
+ Valid Summary | End of Epoch 90 | Time 59.42s | Valid Loss -11.527
605
+ Test Summary | End of Epoch 90 | Time 33.97s | Test Loss -11.091
606
+ No imporvement for 10 epochs, early stopping.
607
+ Start evaluation
608
+ Avg SISNR:i tensor([11.4575], device='cuda:0')
609
+ Avg SNRi: 11.7811785187395
610
+ Avg PESQi: 0.8883232574065526
611
+ Avg STOIi: 0.23720669982369647
checkpoints/log_VoxCeleb2_lip_dprnn_2spk/tensorboard/events.out.tfevents.1729063739.dlc1xpmyvbppmvru-master-0.29.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9b0414019fdd7e35c60240876a2de76cd7ff8182096987efd75d6f749c38b77
3
+ size 13260