AutoDisProxyT-COLA / log_bs32_lr3e-05_20221124_034416_123214.txt
Jinawei's picture
Upload 8 files
cc491f8
raw
history blame
89.9 kB
------------> log file ==runs2/cola/OUTPUT_ID/log_bs32_lr3e-05_20221124_034416_123214.txt
Namespace(aug_train=False, data_dir='/home.local/jianwei/datasets/nlp/glue_data/CoLA', do_eval=False, early_stop=False, early_stop_metric='accuracy', eval_step=120, gradient_accumulation_steps=1, learning_rate=3e-05, local_rank=0, lr_scheduler_type=<SchedulerType.CONSTANT_WITH_WARMUP: 'constant_with_warmup'>, max_length=128, max_train_steps=None, model_name_or_path='/home.local/jianwei/workspace/archive/SparseOptimizer/output/Layer_7_12_Hid_160_768_Head_10_12_IMRatio_3.5', num_train_epochs=30, num_warmup_steps=0, output_dir='runs2/cola/OUTPUT_ID', pad_to_max_length=False, per_device_eval_batch_size=32, per_device_train_batch_size=32, print_step=5, save_last=False, seed=None, task_name='cola', train_file=None, use_slow_tokenizer=False, validation_file=None, weight_decay=0.0)
Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
Sample 3305 of the training set: (tensor([ 101, 2058, 1996, 3481, 2045, 18360, 1037, 2312, 5210, 1012,
102, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor(1)).
Sample 4580 of the training set: (tensor([ 101, 7525, 2097, 3191, 2115, 4311, 1010, 2021, 7157, 2097, 2025, 1012,
102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor(1)).
Sample 1031 of the training set: (tensor([ 101, 2040, 7164, 2505, 2008, 2040, 2758, 1029, 102, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor(1)).
***** Running training *****
Num examples = 8551
Num Epochs = 30
Instantaneous batch size per device = 32
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 1
Total optimization steps = 8040
000005/008040, loss: 0.652222, avg_loss: 0.651953
000010/008040, loss: 0.661713, avg_loss: 0.653055
000015/008040, loss: 0.677567, avg_loss: 0.654401
000020/008040, loss: 0.650467, avg_loss: 0.655538
000025/008040, loss: 0.677872, avg_loss: 0.658810
000030/008040, loss: 0.661591, avg_loss: 0.658843
000035/008040, loss: 0.645645, avg_loss: 0.659464
000040/008040, loss: 0.652420, avg_loss: 0.659061
000045/008040, loss: 0.687469, avg_loss: 0.657230
000050/008040, loss: 0.626709, avg_loss: 0.656326
000055/008040, loss: 0.644302, avg_loss: 0.655348
000060/008040, loss: 0.648483, avg_loss: 0.655236
000065/008040, loss: 0.688477, avg_loss: 0.655078
000070/008040, loss: 0.695236, avg_loss: 0.656294
000075/008040, loss: 0.659332, avg_loss: 0.655568
000080/008040, loss: 0.658981, avg_loss: 0.654595
000085/008040, loss: 0.667786, avg_loss: 0.654187
000090/008040, loss: 0.658127, avg_loss: 0.653667
000095/008040, loss: 0.671814, avg_loss: 0.652597
000100/008040, loss: 0.633545, avg_loss: 0.652570
000105/008040, loss: 0.641556, avg_loss: 0.651439
000110/008040, loss: 0.605942, avg_loss: 0.650307
000115/008040, loss: 0.668716, avg_loss: 0.649737
000120/008040, loss: 0.565681, avg_loss: 0.648114
000125/008040, loss: 0.645332, avg_loss: 0.647180
000130/008040, loss: 0.654228, avg_loss: 0.645764
000135/008040, loss: 0.647469, avg_loss: 0.644160
000140/008040, loss: 0.589897, avg_loss: 0.644381
000145/008040, loss: 0.622665, avg_loss: 0.643773
000150/008040, loss: 0.582314, avg_loss: 0.642514
000155/008040, loss: 0.706223, avg_loss: 0.642192
000160/008040, loss: 0.573463, avg_loss: 0.640978
000165/008040, loss: 0.596039, avg_loss: 0.641289
000170/008040, loss: 0.629677, avg_loss: 0.641707
000175/008040, loss: 0.581818, avg_loss: 0.640926
000180/008040, loss: 0.573257, avg_loss: 0.640330
000185/008040, loss: 0.536911, avg_loss: 0.639325
000190/008040, loss: 0.607376, avg_loss: 0.638997
000195/008040, loss: 0.565323, avg_loss: 0.638167
000200/008040, loss: 0.561142, avg_loss: 0.637244
000205/008040, loss: 0.709908, avg_loss: 0.636707
000210/008040, loss: 0.655846, avg_loss: 0.635711
000215/008040, loss: 0.562042, avg_loss: 0.635813
000220/008040, loss: 0.623016, avg_loss: 0.635337
000225/008040, loss: 0.665192, avg_loss: 0.636041
000230/008040, loss: 0.602562, avg_loss: 0.636022
000235/008040, loss: 0.579742, avg_loss: 0.636067
000240/008040, loss: 0.643562, avg_loss: 0.635816
000245/008040, loss: 0.572937, avg_loss: 0.635337
000250/008040, loss: 0.609734, avg_loss: 0.635397
000255/008040, loss: 0.540230, avg_loss: 0.634626
000260/008040, loss: 0.688667, avg_loss: 0.634769
000265/008040, loss: 0.600624, avg_loss: 0.633670
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 0, step 268/8040: {'matthews_correlation': 0.0}
000270/008040, loss: 0.532043, avg_loss: 0.633611
000275/008040, loss: 0.593201, avg_loss: 0.633069
000280/008040, loss: 0.706932, avg_loss: 0.633117
000285/008040, loss: 0.684525, avg_loss: 0.632790
000290/008040, loss: 0.642426, avg_loss: 0.632020
000295/008040, loss: 0.617699, avg_loss: 0.632309
000300/008040, loss: 0.693405, avg_loss: 0.632864
000305/008040, loss: 0.721481, avg_loss: 0.632628
000310/008040, loss: 0.551292, avg_loss: 0.631563
000315/008040, loss: 0.619102, avg_loss: 0.631161
000320/008040, loss: 0.573761, avg_loss: 0.631119
000325/008040, loss: 0.621773, avg_loss: 0.631340
000330/008040, loss: 0.504166, avg_loss: 0.630779
000335/008040, loss: 0.627258, avg_loss: 0.630988
000340/008040, loss: 0.620651, avg_loss: 0.630650
000345/008040, loss: 0.579193, avg_loss: 0.630047
000350/008040, loss: 0.621361, avg_loss: 0.629630
000355/008040, loss: 0.638191, avg_loss: 0.629452
000360/008040, loss: 0.589127, avg_loss: 0.629559
000365/008040, loss: 0.725616, avg_loss: 0.629725
000370/008040, loss: 0.677101, avg_loss: 0.629617
000375/008040, loss: 0.575119, avg_loss: 0.628965
000380/008040, loss: 0.493011, avg_loss: 0.628739
000385/008040, loss: 0.653969, avg_loss: 0.629037
000390/008040, loss: 0.672195, avg_loss: 0.628962
000395/008040, loss: 0.665962, avg_loss: 0.628864
000400/008040, loss: 0.515625, avg_loss: 0.627842
000405/008040, loss: 0.596878, avg_loss: 0.627075
000410/008040, loss: 0.624397, avg_loss: 0.626773
000415/008040, loss: 0.568840, avg_loss: 0.626794
000420/008040, loss: 0.678520, avg_loss: 0.626516
000425/008040, loss: 0.691116, avg_loss: 0.626167
000430/008040, loss: 0.629356, avg_loss: 0.625845
000435/008040, loss: 0.592529, avg_loss: 0.624969
000440/008040, loss: 0.621323, avg_loss: 0.624545
000445/008040, loss: 0.713966, avg_loss: 0.624795
000450/008040, loss: 0.510971, avg_loss: 0.624669
000455/008040, loss: 0.672722, avg_loss: 0.625094
000460/008040, loss: 0.569397, avg_loss: 0.624232
000465/008040, loss: 0.543617, avg_loss: 0.624290
000470/008040, loss: 0.701065, avg_loss: 0.623677
000475/008040, loss: 0.697563, avg_loss: 0.623886
000480/008040, loss: 0.516510, avg_loss: 0.623712
000485/008040, loss: 0.623947, avg_loss: 0.624315
000490/008040, loss: 0.571724, avg_loss: 0.623997
000495/008040, loss: 0.652824, avg_loss: 0.624104
000500/008040, loss: 0.591492, avg_loss: 0.623351
000505/008040, loss: 0.562141, avg_loss: 0.623008
000510/008040, loss: 0.571037, avg_loss: 0.622772
000515/008040, loss: 0.594940, avg_loss: 0.622726
000520/008040, loss: 0.595879, avg_loss: 0.622236
000525/008040, loss: 0.420456, avg_loss: 0.621296
000530/008040, loss: 0.679199, avg_loss: 0.621323
000535/008040, loss: 0.630341, avg_loss: 0.621167
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 1, step 536/8040: {'matthews_correlation': 0.0}
000540/008040, loss: 0.539192, avg_loss: 0.620370
000545/008040, loss: 0.624496, avg_loss: 0.620560
000550/008040, loss: 0.478798, avg_loss: 0.620446
000555/008040, loss: 0.623642, avg_loss: 0.619969
000560/008040, loss: 0.588112, avg_loss: 0.620092
000565/008040, loss: 0.537552, avg_loss: 0.619577
000570/008040, loss: 0.650581, avg_loss: 0.619168
000575/008040, loss: 0.587677, avg_loss: 0.619309
000580/008040, loss: 0.557770, avg_loss: 0.619064
000585/008040, loss: 0.741516, avg_loss: 0.619524
000590/008040, loss: 0.600594, avg_loss: 0.619864
000595/008040, loss: 0.669243, avg_loss: 0.619429
000600/008040, loss: 0.649200, avg_loss: 0.619334
000605/008040, loss: 0.707039, avg_loss: 0.619096
000610/008040, loss: 0.570152, avg_loss: 0.618888
000615/008040, loss: 0.678078, avg_loss: 0.618893
000620/008040, loss: 0.590279, avg_loss: 0.618822
000625/008040, loss: 0.648178, avg_loss: 0.618851
000630/008040, loss: 0.707413, avg_loss: 0.619119
000635/008040, loss: 0.613609, avg_loss: 0.619032
000640/008040, loss: 0.637474, avg_loss: 0.619302
000645/008040, loss: 0.620033, avg_loss: 0.619061
000650/008040, loss: 0.674812, avg_loss: 0.619388
000655/008040, loss: 0.603241, avg_loss: 0.619122
000660/008040, loss: 0.596001, avg_loss: 0.619185
000665/008040, loss: 0.546028, avg_loss: 0.618988
000670/008040, loss: 0.467194, avg_loss: 0.618476
000675/008040, loss: 0.539108, avg_loss: 0.617887
000680/008040, loss: 0.630531, avg_loss: 0.617587
000685/008040, loss: 0.628906, avg_loss: 0.617714
000690/008040, loss: 0.662460, avg_loss: 0.617408
000695/008040, loss: 0.556480, avg_loss: 0.617490
000700/008040, loss: 0.562729, avg_loss: 0.617541
000705/008040, loss: 0.534187, avg_loss: 0.617240
000710/008040, loss: 0.705315, avg_loss: 0.617217
000715/008040, loss: 0.628326, avg_loss: 0.617076
000720/008040, loss: 0.451012, avg_loss: 0.616425
000725/008040, loss: 0.679764, avg_loss: 0.616494
000730/008040, loss: 0.648232, avg_loss: 0.616603
000735/008040, loss: 0.775688, avg_loss: 0.616731
000740/008040, loss: 0.483841, avg_loss: 0.616351
000745/008040, loss: 0.480392, avg_loss: 0.616453
000750/008040, loss: 0.641754, avg_loss: 0.616477
000755/008040, loss: 0.655449, avg_loss: 0.616491
000760/008040, loss: 0.620735, avg_loss: 0.616589
000765/008040, loss: 0.698372, avg_loss: 0.616510
000770/008040, loss: 0.541710, avg_loss: 0.616465
000775/008040, loss: 0.616829, avg_loss: 0.616582
000780/008040, loss: 0.615852, avg_loss: 0.616320
000785/008040, loss: 0.645004, avg_loss: 0.616204
000790/008040, loss: 0.620392, avg_loss: 0.616373
000795/008040, loss: 0.621986, avg_loss: 0.616532
000800/008040, loss: 0.698029, avg_loss: 0.616433
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 2, step 804/8040: {'matthews_correlation': 0.0}
000805/008040, loss: 0.666588, avg_loss: 0.616280
000810/008040, loss: 0.513000, avg_loss: 0.616144
000815/008040, loss: 0.678246, avg_loss: 0.616261
000820/008040, loss: 0.508728, avg_loss: 0.616257
000825/008040, loss: 0.561485, avg_loss: 0.615815
000830/008040, loss: 0.629829, avg_loss: 0.615827
000835/008040, loss: 0.614128, avg_loss: 0.615832
000840/008040, loss: 0.594833, avg_loss: 0.615989
000845/008040, loss: 0.536339, avg_loss: 0.616183
000850/008040, loss: 0.599487, avg_loss: 0.616021
000855/008040, loss: 0.673401, avg_loss: 0.616031
000860/008040, loss: 0.539940, avg_loss: 0.616041
000865/008040, loss: 0.572212, avg_loss: 0.616227
000870/008040, loss: 0.609131, avg_loss: 0.615873
000875/008040, loss: 0.614037, avg_loss: 0.615787
000880/008040, loss: 0.571541, avg_loss: 0.615553
000885/008040, loss: 0.544708, avg_loss: 0.615402
000890/008040, loss: 0.536331, avg_loss: 0.615272
000895/008040, loss: 0.675529, avg_loss: 0.615234
000900/008040, loss: 0.627602, avg_loss: 0.615170
000905/008040, loss: 0.660126, avg_loss: 0.615088
000910/008040, loss: 0.558937, avg_loss: 0.614803
000915/008040, loss: 0.549660, avg_loss: 0.614641
000920/008040, loss: 0.643845, avg_loss: 0.614743
000925/008040, loss: 0.572258, avg_loss: 0.614782
000930/008040, loss: 0.634689, avg_loss: 0.614739
000935/008040, loss: 0.543579, avg_loss: 0.614672
000940/008040, loss: 0.714607, avg_loss: 0.614516
000945/008040, loss: 0.596260, avg_loss: 0.614343
000950/008040, loss: 0.564911, avg_loss: 0.614114
000955/008040, loss: 0.735931, avg_loss: 0.614013
000960/008040, loss: 0.742020, avg_loss: 0.614185
000965/008040, loss: 0.632996, avg_loss: 0.613984
000970/008040, loss: 0.505898, avg_loss: 0.614050
000975/008040, loss: 0.632904, avg_loss: 0.614360
000980/008040, loss: 0.637787, avg_loss: 0.614284
000985/008040, loss: 0.544106, avg_loss: 0.614300
000990/008040, loss: 0.627823, avg_loss: 0.614128
000995/008040, loss: 0.518829, avg_loss: 0.613865
001000/008040, loss: 0.682663, avg_loss: 0.613936
001005/008040, loss: 0.702003, avg_loss: 0.614055
001010/008040, loss: 0.638351, avg_loss: 0.613856
001015/008040, loss: 0.643425, avg_loss: 0.613794
001020/008040, loss: 0.511559, avg_loss: 0.613882
001025/008040, loss: 0.570694, avg_loss: 0.613526
001030/008040, loss: 0.742371, avg_loss: 0.613564
001035/008040, loss: 0.525169, avg_loss: 0.613386
001040/008040, loss: 0.631721, avg_loss: 0.613472
001045/008040, loss: 0.678436, avg_loss: 0.613299
001050/008040, loss: 0.496040, avg_loss: 0.613325
001055/008040, loss: 0.656860, avg_loss: 0.613305
001060/008040, loss: 0.574501, avg_loss: 0.613204
001065/008040, loss: 0.547646, avg_loss: 0.613304
001070/008040, loss: 0.567757, avg_loss: 0.613248
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 3, step 1072/8040: {'matthews_correlation': 0.0}
001075/008040, loss: 0.516525, avg_loss: 0.613198
001080/008040, loss: 0.646126, avg_loss: 0.613297
001085/008040, loss: 0.559669, avg_loss: 0.613150
001090/008040, loss: 0.466164, avg_loss: 0.612698
001095/008040, loss: 0.651367, avg_loss: 0.612560
001100/008040, loss: 0.654549, avg_loss: 0.612559
001105/008040, loss: 0.541718, avg_loss: 0.612571
001110/008040, loss: 0.508430, avg_loss: 0.612473
001115/008040, loss: 0.573200, avg_loss: 0.612111
001120/008040, loss: 0.557476, avg_loss: 0.611973
001125/008040, loss: 0.652611, avg_loss: 0.611617
001130/008040, loss: 0.526085, avg_loss: 0.611471
001135/008040, loss: 0.444359, avg_loss: 0.610979
001140/008040, loss: 0.628677, avg_loss: 0.610810
001145/008040, loss: 0.503414, avg_loss: 0.610575
001150/008040, loss: 0.473530, avg_loss: 0.610274
001155/008040, loss: 0.545444, avg_loss: 0.609830
001160/008040, loss: 0.505241, avg_loss: 0.609483
001165/008040, loss: 0.495068, avg_loss: 0.609279
001170/008040, loss: 0.588757, avg_loss: 0.609380
001175/008040, loss: 0.567360, avg_loss: 0.609198
001180/008040, loss: 0.648659, avg_loss: 0.609238
001185/008040, loss: 0.637810, avg_loss: 0.609066
001190/008040, loss: 0.824970, avg_loss: 0.608990
001195/008040, loss: 0.519943, avg_loss: 0.608662
001200/008040, loss: 0.601860, avg_loss: 0.608558
001205/008040, loss: 0.583908, avg_loss: 0.608157
001210/008040, loss: 0.725353, avg_loss: 0.608082
001215/008040, loss: 0.490597, avg_loss: 0.607950
001220/008040, loss: 0.425632, avg_loss: 0.607686
001225/008040, loss: 0.596016, avg_loss: 0.607460
001230/008040, loss: 0.577240, avg_loss: 0.607340
001235/008040, loss: 0.586720, avg_loss: 0.607166
001240/008040, loss: 0.395218, avg_loss: 0.606811
001245/008040, loss: 0.633438, avg_loss: 0.606556
001250/008040, loss: 0.680958, avg_loss: 0.606540
001255/008040, loss: 0.628593, avg_loss: 0.606489
001260/008040, loss: 0.440163, avg_loss: 0.606267
001265/008040, loss: 0.556847, avg_loss: 0.606262
001270/008040, loss: 0.504066, avg_loss: 0.605940
001275/008040, loss: 0.572201, avg_loss: 0.605561
001280/008040, loss: 0.718948, avg_loss: 0.605577
001285/008040, loss: 0.485722, avg_loss: 0.605334
001290/008040, loss: 0.612705, avg_loss: 0.605227
001295/008040, loss: 0.666271, avg_loss: 0.604907
001300/008040, loss: 0.501457, avg_loss: 0.604697
001305/008040, loss: 0.642426, avg_loss: 0.604318
001310/008040, loss: 0.627075, avg_loss: 0.604377
001315/008040, loss: 0.406536, avg_loss: 0.603942
001320/008040, loss: 0.498077, avg_loss: 0.603909
001325/008040, loss: 0.622322, avg_loss: 0.604236
001330/008040, loss: 0.563160, avg_loss: 0.604285
001335/008040, loss: 0.545391, avg_loss: 0.604257
001340/008040, loss: 0.478202, avg_loss: 0.604044
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 4, step 1340/8040: {'matthews_correlation': 0.15816318746785782}
001345/008040, loss: 0.497211, avg_loss: 0.603771
001350/008040, loss: 0.504078, avg_loss: 0.603257
001355/008040, loss: 0.563469, avg_loss: 0.603027
001360/008040, loss: 0.436638, avg_loss: 0.602608
001365/008040, loss: 0.503475, avg_loss: 0.602021
001370/008040, loss: 0.512344, avg_loss: 0.601490
001375/008040, loss: 0.479725, avg_loss: 0.601133
001380/008040, loss: 0.547909, avg_loss: 0.600951
001385/008040, loss: 0.400013, avg_loss: 0.600657
001390/008040, loss: 0.542755, avg_loss: 0.600452
001395/008040, loss: 0.376102, avg_loss: 0.600132
001400/008040, loss: 0.548512, avg_loss: 0.599679
001405/008040, loss: 0.676727, avg_loss: 0.599476
001410/008040, loss: 0.657574, avg_loss: 0.599159
001415/008040, loss: 0.496979, avg_loss: 0.598694
001420/008040, loss: 0.572510, avg_loss: 0.598379
001425/008040, loss: 0.535450, avg_loss: 0.598210
001430/008040, loss: 0.394039, avg_loss: 0.597776
001435/008040, loss: 0.513657, avg_loss: 0.597460
001440/008040, loss: 0.511215, avg_loss: 0.597127
001445/008040, loss: 0.339886, avg_loss: 0.596767
001450/008040, loss: 0.510815, avg_loss: 0.596433
001455/008040, loss: 0.654274, avg_loss: 0.596212
001460/008040, loss: 0.349232, avg_loss: 0.595762
001465/008040, loss: 0.493137, avg_loss: 0.595544
001470/008040, loss: 0.592804, avg_loss: 0.595768
001475/008040, loss: 0.455566, avg_loss: 0.595485
001480/008040, loss: 0.570206, avg_loss: 0.595109
001485/008040, loss: 0.385925, avg_loss: 0.594787
001490/008040, loss: 0.431305, avg_loss: 0.594303
001495/008040, loss: 0.571796, avg_loss: 0.594218
001500/008040, loss: 0.523552, avg_loss: 0.594122
001505/008040, loss: 0.777546, avg_loss: 0.594023
001510/008040, loss: 0.629959, avg_loss: 0.593755
001515/008040, loss: 0.650162, avg_loss: 0.593670
001520/008040, loss: 0.509098, avg_loss: 0.593517
001525/008040, loss: 0.462357, avg_loss: 0.593270
001530/008040, loss: 0.643883, avg_loss: 0.593174
001535/008040, loss: 0.417114, avg_loss: 0.592827
001540/008040, loss: 0.470078, avg_loss: 0.592541
001545/008040, loss: 0.504375, avg_loss: 0.592232
001550/008040, loss: 0.536144, avg_loss: 0.592052
001555/008040, loss: 0.408104, avg_loss: 0.591786
001560/008040, loss: 0.476906, avg_loss: 0.591672
001565/008040, loss: 0.560863, avg_loss: 0.591418
001570/008040, loss: 0.519066, avg_loss: 0.591105
001575/008040, loss: 0.594341, avg_loss: 0.590856
001580/008040, loss: 0.466764, avg_loss: 0.590662
001585/008040, loss: 0.450207, avg_loss: 0.590457
001590/008040, loss: 0.435432, avg_loss: 0.590293
001595/008040, loss: 0.556160, avg_loss: 0.589989
001600/008040, loss: 0.611170, avg_loss: 0.589724
001605/008040, loss: 0.402641, avg_loss: 0.589453
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 5, step 1608/8040: {'matthews_correlation': 0.1895854925674006}
001610/008040, loss: 0.432098, avg_loss: 0.588991
001615/008040, loss: 0.498215, avg_loss: 0.588663
001620/008040, loss: 0.663460, avg_loss: 0.588371
001625/008040, loss: 0.360754, avg_loss: 0.587877
001630/008040, loss: 0.387835, avg_loss: 0.587359
001635/008040, loss: 0.324583, avg_loss: 0.586922
001640/008040, loss: 0.535975, avg_loss: 0.586623
001645/008040, loss: 0.592346, avg_loss: 0.586544
001650/008040, loss: 0.525536, avg_loss: 0.586103
001655/008040, loss: 0.575745, avg_loss: 0.585766
001660/008040, loss: 0.449825, avg_loss: 0.585359
001665/008040, loss: 0.420126, avg_loss: 0.585023
001670/008040, loss: 0.639540, avg_loss: 0.584721
001675/008040, loss: 0.344273, avg_loss: 0.584341
001680/008040, loss: 0.540806, avg_loss: 0.583815
001685/008040, loss: 0.384409, avg_loss: 0.583182
001690/008040, loss: 0.440865, avg_loss: 0.583039
001695/008040, loss: 0.555412, avg_loss: 0.582610
001700/008040, loss: 0.407997, avg_loss: 0.582275
001705/008040, loss: 0.383024, avg_loss: 0.581847
001710/008040, loss: 0.537008, avg_loss: 0.581442
001715/008040, loss: 0.355068, avg_loss: 0.581088
001720/008040, loss: 0.338402, avg_loss: 0.580900
001725/008040, loss: 0.566572, avg_loss: 0.580647
001730/008040, loss: 0.548752, avg_loss: 0.580402
001735/008040, loss: 0.535681, avg_loss: 0.580117
001740/008040, loss: 0.614624, avg_loss: 0.579823
001745/008040, loss: 0.530151, avg_loss: 0.579474
001750/008040, loss: 0.489605, avg_loss: 0.579157
001755/008040, loss: 0.584639, avg_loss: 0.578986
001760/008040, loss: 0.501501, avg_loss: 0.578577
001765/008040, loss: 0.511179, avg_loss: 0.578278
001770/008040, loss: 0.400345, avg_loss: 0.577970
001775/008040, loss: 0.562296, avg_loss: 0.577756
001780/008040, loss: 0.460430, avg_loss: 0.577570
001785/008040, loss: 0.420773, avg_loss: 0.577251
001790/008040, loss: 0.476810, avg_loss: 0.576814
001795/008040, loss: 0.473215, avg_loss: 0.576538
001800/008040, loss: 0.324244, avg_loss: 0.576321
001805/008040, loss: 0.466421, avg_loss: 0.575848
001810/008040, loss: 0.515072, avg_loss: 0.575525
001815/008040, loss: 0.348133, avg_loss: 0.575163
001820/008040, loss: 0.405993, avg_loss: 0.574717
001825/008040, loss: 0.724440, avg_loss: 0.574358
001830/008040, loss: 0.404591, avg_loss: 0.573989
001835/008040, loss: 0.386953, avg_loss: 0.573703
001840/008040, loss: 0.297220, avg_loss: 0.573313
001845/008040, loss: 0.549763, avg_loss: 0.573012
001850/008040, loss: 0.439955, avg_loss: 0.572694
001855/008040, loss: 0.379715, avg_loss: 0.572410
001860/008040, loss: 0.301922, avg_loss: 0.571970
001865/008040, loss: 0.442181, avg_loss: 0.571632
001870/008040, loss: 0.403500, avg_loss: 0.571553
001875/008040, loss: 0.479061, avg_loss: 0.571291
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 6, step 1876/8040: {'matthews_correlation': 0.21307686539085852}
001880/008040, loss: 0.669218, avg_loss: 0.571140
001885/008040, loss: 0.542337, avg_loss: 0.570861
001890/008040, loss: 0.589821, avg_loss: 0.570818
001895/008040, loss: 0.476381, avg_loss: 0.570683
001900/008040, loss: 0.404007, avg_loss: 0.570266
001905/008040, loss: 0.356413, avg_loss: 0.569667
001910/008040, loss: 0.477184, avg_loss: 0.569371
001915/008040, loss: 0.490374, avg_loss: 0.568927
001920/008040, loss: 0.415562, avg_loss: 0.568538
001925/008040, loss: 0.388258, avg_loss: 0.568085
001930/008040, loss: 0.456861, avg_loss: 0.567801
001935/008040, loss: 0.384424, avg_loss: 0.567589
001940/008040, loss: 0.353615, avg_loss: 0.567248
001945/008040, loss: 0.391302, avg_loss: 0.566774
001950/008040, loss: 0.465639, avg_loss: 0.566302
001955/008040, loss: 0.230816, avg_loss: 0.565751
001960/008040, loss: 0.518036, avg_loss: 0.565276
001965/008040, loss: 0.410194, avg_loss: 0.564808
001970/008040, loss: 0.349751, avg_loss: 0.564646
001975/008040, loss: 0.334585, avg_loss: 0.564052
001980/008040, loss: 0.534948, avg_loss: 0.563787
001985/008040, loss: 0.362720, avg_loss: 0.563456
001990/008040, loss: 0.448977, avg_loss: 0.563030
001995/008040, loss: 0.355686, avg_loss: 0.562736
002000/008040, loss: 0.439312, avg_loss: 0.562282
002005/008040, loss: 0.560658, avg_loss: 0.561938
002010/008040, loss: 0.257093, avg_loss: 0.561694
002015/008040, loss: 0.427464, avg_loss: 0.561446
002020/008040, loss: 0.501780, avg_loss: 0.561143
002025/008040, loss: 0.395622, avg_loss: 0.560958
002030/008040, loss: 0.593700, avg_loss: 0.560682
002035/008040, loss: 0.260624, avg_loss: 0.560042
002040/008040, loss: 0.331761, avg_loss: 0.559656
002045/008040, loss: 0.454527, avg_loss: 0.559403
002050/008040, loss: 0.379538, avg_loss: 0.559265
002055/008040, loss: 0.376312, avg_loss: 0.558873
002060/008040, loss: 0.522795, avg_loss: 0.558628
002065/008040, loss: 0.457785, avg_loss: 0.558271
002070/008040, loss: 0.561003, avg_loss: 0.557992
002075/008040, loss: 0.269575, avg_loss: 0.557529
002080/008040, loss: 0.476014, avg_loss: 0.557187
002085/008040, loss: 0.425539, avg_loss: 0.556978
002090/008040, loss: 0.350451, avg_loss: 0.556575
002095/008040, loss: 0.275333, avg_loss: 0.556114
002100/008040, loss: 0.463511, avg_loss: 0.556032
002105/008040, loss: 0.313173, avg_loss: 0.555600
002110/008040, loss: 0.555047, avg_loss: 0.555265
002115/008040, loss: 0.554151, avg_loss: 0.555004
002120/008040, loss: 0.383408, avg_loss: 0.554883
002125/008040, loss: 0.488256, avg_loss: 0.554548
002130/008040, loss: 0.357615, avg_loss: 0.554264
002135/008040, loss: 0.157333, avg_loss: 0.553637
002140/008040, loss: 0.288863, avg_loss: 0.553479
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 7, step 2144/8040: {'matthews_correlation': 0.22254373946847703}
002145/008040, loss: 0.303406, avg_loss: 0.553004
002150/008040, loss: 0.306277, avg_loss: 0.552618
002155/008040, loss: 0.338732, avg_loss: 0.552232
002160/008040, loss: 0.427835, avg_loss: 0.551830
002165/008040, loss: 0.414758, avg_loss: 0.551564
002170/008040, loss: 0.576152, avg_loss: 0.551253
002175/008040, loss: 0.407147, avg_loss: 0.550871
002180/008040, loss: 0.399303, avg_loss: 0.550418
002185/008040, loss: 0.417184, avg_loss: 0.550105
002190/008040, loss: 0.373152, avg_loss: 0.549766
002195/008040, loss: 0.357236, avg_loss: 0.549473
002200/008040, loss: 0.488667, avg_loss: 0.549189
002205/008040, loss: 0.430184, avg_loss: 0.548822
002210/008040, loss: 0.509947, avg_loss: 0.548544
002215/008040, loss: 0.248991, avg_loss: 0.548000
002220/008040, loss: 0.524855, avg_loss: 0.547764
002225/008040, loss: 0.307470, avg_loss: 0.547397
002230/008040, loss: 0.317670, avg_loss: 0.546941
002235/008040, loss: 0.257109, avg_loss: 0.546452
002240/008040, loss: 0.437452, avg_loss: 0.546180
002245/008040, loss: 0.416744, avg_loss: 0.545923
002250/008040, loss: 0.301953, avg_loss: 0.545449
002255/008040, loss: 0.505546, avg_loss: 0.545019
002260/008040, loss: 0.438693, avg_loss: 0.544505
002265/008040, loss: 0.269212, avg_loss: 0.544109
002270/008040, loss: 0.436381, avg_loss: 0.543815
002275/008040, loss: 0.455789, avg_loss: 0.543614
002280/008040, loss: 0.498013, avg_loss: 0.543247
002285/008040, loss: 0.329160, avg_loss: 0.542899
002290/008040, loss: 0.281924, avg_loss: 0.542624
002295/008040, loss: 0.366100, avg_loss: 0.542508
002300/008040, loss: 0.371017, avg_loss: 0.542094
002305/008040, loss: 0.395582, avg_loss: 0.541701
002310/008040, loss: 0.396773, avg_loss: 0.541435
002315/008040, loss: 0.382794, avg_loss: 0.541095
002320/008040, loss: 0.306365, avg_loss: 0.540708
002325/008040, loss: 0.434783, avg_loss: 0.540375
002330/008040, loss: 0.357806, avg_loss: 0.539949
002335/008040, loss: 0.380549, avg_loss: 0.539729
002340/008040, loss: 0.216732, avg_loss: 0.539146
002345/008040, loss: 0.385883, avg_loss: 0.538827
002350/008040, loss: 0.594073, avg_loss: 0.538528
002355/008040, loss: 0.413212, avg_loss: 0.538440
002360/008040, loss: 0.210932, avg_loss: 0.538051
002365/008040, loss: 0.336315, avg_loss: 0.537636
002370/008040, loss: 0.447609, avg_loss: 0.537165
002375/008040, loss: 0.580924, avg_loss: 0.537083
002380/008040, loss: 0.482136, avg_loss: 0.536826
002385/008040, loss: 0.290261, avg_loss: 0.536410
002390/008040, loss: 0.567177, avg_loss: 0.536199
002395/008040, loss: 0.543372, avg_loss: 0.535950
002400/008040, loss: 0.344522, avg_loss: 0.535539
002405/008040, loss: 0.219617, avg_loss: 0.535235
002410/008040, loss: 0.449083, avg_loss: 0.534978
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 8, step 2412/8040: {'matthews_correlation': 0.22951168079779777}
002415/008040, loss: 0.347907, avg_loss: 0.534702
002420/008040, loss: 0.380767, avg_loss: 0.534286
002425/008040, loss: 0.322007, avg_loss: 0.533838
002430/008040, loss: 0.468786, avg_loss: 0.533497
002435/008040, loss: 0.309734, avg_loss: 0.533102
002440/008040, loss: 0.197897, avg_loss: 0.532586
002445/008040, loss: 0.345694, avg_loss: 0.532278
002450/008040, loss: 0.559374, avg_loss: 0.531879
002455/008040, loss: 0.311327, avg_loss: 0.531552
002460/008040, loss: 0.300930, avg_loss: 0.531075
002465/008040, loss: 0.339101, avg_loss: 0.530702
002470/008040, loss: 0.473831, avg_loss: 0.530428
002475/008040, loss: 0.467193, avg_loss: 0.530181
002480/008040, loss: 0.500907, avg_loss: 0.529953
002485/008040, loss: 0.457974, avg_loss: 0.529605
002490/008040, loss: 0.343074, avg_loss: 0.529418
002495/008040, loss: 0.444531, avg_loss: 0.529064
002500/008040, loss: 0.330952, avg_loss: 0.528731
002505/008040, loss: 0.258092, avg_loss: 0.528329
002510/008040, loss: 0.373878, avg_loss: 0.527944
002515/008040, loss: 0.298052, avg_loss: 0.527590
002520/008040, loss: 0.290030, avg_loss: 0.527212
002525/008040, loss: 0.235393, avg_loss: 0.526718
002530/008040, loss: 0.293452, avg_loss: 0.526214
002535/008040, loss: 0.193506, avg_loss: 0.525848
002540/008040, loss: 0.344110, avg_loss: 0.525426
002545/008040, loss: 0.457061, avg_loss: 0.525270
002550/008040, loss: 0.443786, avg_loss: 0.524819
002555/008040, loss: 0.245760, avg_loss: 0.524364
002560/008040, loss: 0.319549, avg_loss: 0.524011
002565/008040, loss: 0.259768, avg_loss: 0.523602
002570/008040, loss: 0.575876, avg_loss: 0.523300
002575/008040, loss: 0.411968, avg_loss: 0.522964
002580/008040, loss: 0.563390, avg_loss: 0.522671
002585/008040, loss: 0.227027, avg_loss: 0.522272
002590/008040, loss: 0.354241, avg_loss: 0.521908
002595/008040, loss: 0.265463, avg_loss: 0.521373
002600/008040, loss: 0.427238, avg_loss: 0.521074
002605/008040, loss: 0.287523, avg_loss: 0.520749
002610/008040, loss: 0.325408, avg_loss: 0.520449
002615/008040, loss: 0.423108, avg_loss: 0.520330
002620/008040, loss: 0.443160, avg_loss: 0.520100
002625/008040, loss: 0.299728, avg_loss: 0.519799
002630/008040, loss: 0.535900, avg_loss: 0.519659
002635/008040, loss: 0.479263, avg_loss: 0.519475
002640/008040, loss: 0.471956, avg_loss: 0.519183
002645/008040, loss: 0.267595, avg_loss: 0.518814
002650/008040, loss: 0.322007, avg_loss: 0.518672
002655/008040, loss: 0.332003, avg_loss: 0.518388
002660/008040, loss: 0.439392, avg_loss: 0.518129
002665/008040, loss: 0.322509, avg_loss: 0.517892
002670/008040, loss: 0.328728, avg_loss: 0.517541
002675/008040, loss: 0.224217, avg_loss: 0.517160
002680/008040, loss: 0.213649, avg_loss: 0.516849
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 9, step 2680/8040: {'matthews_correlation': 0.23063296136375847}
002685/008040, loss: 0.357351, avg_loss: 0.516590
002690/008040, loss: 0.191270, avg_loss: 0.516160
002695/008040, loss: 0.370189, avg_loss: 0.515724
002700/008040, loss: 0.336470, avg_loss: 0.515376
002705/008040, loss: 0.268067, avg_loss: 0.515164
002710/008040, loss: 0.179003, avg_loss: 0.514576
002715/008040, loss: 0.427791, avg_loss: 0.514281
002720/008040, loss: 0.361250, avg_loss: 0.513829
002725/008040, loss: 0.189704, avg_loss: 0.513455
002730/008040, loss: 0.273372, avg_loss: 0.513143
002735/008040, loss: 0.239124, avg_loss: 0.512584
002740/008040, loss: 0.378276, avg_loss: 0.512291
002745/008040, loss: 0.646600, avg_loss: 0.512093
002750/008040, loss: 0.267004, avg_loss: 0.511666
002755/008040, loss: 0.422471, avg_loss: 0.511320
002760/008040, loss: 0.209467, avg_loss: 0.510807
002765/008040, loss: 0.346109, avg_loss: 0.510532
002770/008040, loss: 0.233545, avg_loss: 0.510248
002775/008040, loss: 0.466287, avg_loss: 0.509798
002780/008040, loss: 0.403711, avg_loss: 0.509469
002785/008040, loss: 0.327383, avg_loss: 0.509096
002790/008040, loss: 0.404777, avg_loss: 0.508683
002795/008040, loss: 0.542493, avg_loss: 0.508458
002800/008040, loss: 0.367202, avg_loss: 0.508142
002805/008040, loss: 0.368768, avg_loss: 0.507783
002810/008040, loss: 0.312175, avg_loss: 0.507447
002815/008040, loss: 0.302803, avg_loss: 0.507226
002820/008040, loss: 0.401391, avg_loss: 0.507006
002825/008040, loss: 0.244652, avg_loss: 0.506694
002830/008040, loss: 0.370769, avg_loss: 0.506369
002835/008040, loss: 0.182358, avg_loss: 0.505992
002840/008040, loss: 0.266695, avg_loss: 0.505660
002845/008040, loss: 0.334674, avg_loss: 0.505358
002850/008040, loss: 0.348623, avg_loss: 0.505099
002855/008040, loss: 0.187014, avg_loss: 0.504682
002860/008040, loss: 0.259943, avg_loss: 0.504276
002865/008040, loss: 0.382271, avg_loss: 0.503964
002870/008040, loss: 0.570188, avg_loss: 0.503794
002875/008040, loss: 0.218833, avg_loss: 0.503395
002880/008040, loss: 0.356927, avg_loss: 0.502927
002885/008040, loss: 0.173357, avg_loss: 0.502596
002890/008040, loss: 0.360153, avg_loss: 0.502279
002895/008040, loss: 0.295114, avg_loss: 0.501944
002900/008040, loss: 0.210005, avg_loss: 0.501620
002905/008040, loss: 0.300519, avg_loss: 0.501388
002910/008040, loss: 0.207313, avg_loss: 0.501142
002915/008040, loss: 0.333078, avg_loss: 0.500983
002920/008040, loss: 0.285614, avg_loss: 0.500578
002925/008040, loss: 0.373211, avg_loss: 0.500342
002930/008040, loss: 0.308893, avg_loss: 0.500114
002935/008040, loss: 0.350144, avg_loss: 0.499820
002940/008040, loss: 0.439544, avg_loss: 0.499650
002945/008040, loss: 0.320980, avg_loss: 0.499399
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 10, step 2948/8040: {'matthews_correlation': 0.18813850606847293}
002950/008040, loss: 0.588822, avg_loss: 0.499118
002955/008040, loss: 0.375319, avg_loss: 0.498777
002960/008040, loss: 0.218809, avg_loss: 0.498292
002965/008040, loss: 0.346797, avg_loss: 0.497883
002970/008040, loss: 0.256976, avg_loss: 0.497420
002975/008040, loss: 0.267139, avg_loss: 0.496947
002980/008040, loss: 0.096225, avg_loss: 0.496468
002985/008040, loss: 0.540555, avg_loss: 0.496152
002990/008040, loss: 0.360075, avg_loss: 0.495919
002995/008040, loss: 0.439329, avg_loss: 0.495439
003000/008040, loss: 0.407537, avg_loss: 0.495246
003005/008040, loss: 0.148738, avg_loss: 0.494893
003010/008040, loss: 0.212634, avg_loss: 0.494479
003015/008040, loss: 0.413274, avg_loss: 0.494294
003020/008040, loss: 0.505027, avg_loss: 0.493973
003025/008040, loss: 0.353874, avg_loss: 0.493663
003030/008040, loss: 0.191752, avg_loss: 0.493211
003035/008040, loss: 0.262918, avg_loss: 0.492834
003040/008040, loss: 0.251038, avg_loss: 0.492580
003045/008040, loss: 0.291340, avg_loss: 0.492184
003050/008040, loss: 0.387451, avg_loss: 0.491783
003055/008040, loss: 0.277742, avg_loss: 0.491582
003060/008040, loss: 0.147416, avg_loss: 0.491181
003065/008040, loss: 0.383072, avg_loss: 0.490947
003070/008040, loss: 0.332592, avg_loss: 0.490599
003075/008040, loss: 0.376683, avg_loss: 0.490302
003080/008040, loss: 0.316549, avg_loss: 0.490015
003085/008040, loss: 0.248254, avg_loss: 0.489692
003090/008040, loss: 0.446958, avg_loss: 0.489378
003095/008040, loss: 0.534116, avg_loss: 0.489182
003100/008040, loss: 0.186525, avg_loss: 0.488832
003105/008040, loss: 0.367679, avg_loss: 0.488583
003110/008040, loss: 0.263721, avg_loss: 0.488235
003115/008040, loss: 0.160604, avg_loss: 0.487953
003120/008040, loss: 0.351810, avg_loss: 0.487793
003125/008040, loss: 0.282861, avg_loss: 0.487567
003130/008040, loss: 0.291616, avg_loss: 0.487259
003135/008040, loss: 0.215605, avg_loss: 0.486928
003140/008040, loss: 0.497842, avg_loss: 0.486740
003145/008040, loss: 0.141915, avg_loss: 0.486475
003150/008040, loss: 0.493731, avg_loss: 0.486197
003155/008040, loss: 0.337679, avg_loss: 0.486007
003160/008040, loss: 0.423936, avg_loss: 0.485696
003165/008040, loss: 0.325907, avg_loss: 0.485502
003170/008040, loss: 0.373995, avg_loss: 0.485242
003175/008040, loss: 0.221332, avg_loss: 0.484958
003180/008040, loss: 0.245173, avg_loss: 0.484625
003185/008040, loss: 0.243983, avg_loss: 0.484395
003190/008040, loss: 0.279999, avg_loss: 0.484095
003195/008040, loss: 0.289218, avg_loss: 0.483891
003200/008040, loss: 0.619335, avg_loss: 0.483651
003205/008040, loss: 0.231071, avg_loss: 0.483458
003210/008040, loss: 0.285882, avg_loss: 0.483162
003215/008040, loss: 0.306520, avg_loss: 0.482869
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 11, step 3216/8040: {'matthews_correlation': 0.20603205189543294}
003220/008040, loss: 0.297244, avg_loss: 0.482532
003225/008040, loss: 0.182570, avg_loss: 0.482196
003230/008040, loss: 0.242626, avg_loss: 0.481782
003235/008040, loss: 0.463844, avg_loss: 0.481595
003240/008040, loss: 0.132137, avg_loss: 0.481325
003245/008040, loss: 0.392527, avg_loss: 0.481101
003250/008040, loss: 0.270020, avg_loss: 0.480822
003255/008040, loss: 0.137306, avg_loss: 0.480509
003260/008040, loss: 0.391590, avg_loss: 0.480182
003265/008040, loss: 0.262886, avg_loss: 0.479862
003270/008040, loss: 0.058543, avg_loss: 0.479458
003275/008040, loss: 0.404657, avg_loss: 0.479210
003280/008040, loss: 0.276438, avg_loss: 0.478893
003285/008040, loss: 0.190946, avg_loss: 0.478536
003290/008040, loss: 0.490697, avg_loss: 0.478209
003295/008040, loss: 0.276675, avg_loss: 0.478014
003300/008040, loss: 0.402889, avg_loss: 0.477756
003305/008040, loss: 0.233364, avg_loss: 0.477388
003310/008040, loss: 0.231294, avg_loss: 0.477059
003315/008040, loss: 0.109852, avg_loss: 0.476722
003320/008040, loss: 0.320070, avg_loss: 0.476551
003325/008040, loss: 0.244717, avg_loss: 0.476204
003330/008040, loss: 0.181756, avg_loss: 0.475863
003335/008040, loss: 0.222641, avg_loss: 0.475576
003340/008040, loss: 0.121102, avg_loss: 0.475165
003345/008040, loss: 0.265407, avg_loss: 0.474816
003350/008040, loss: 0.322964, avg_loss: 0.474535
003355/008040, loss: 0.237767, avg_loss: 0.474252
003360/008040, loss: 0.343557, avg_loss: 0.473983
003365/008040, loss: 0.257172, avg_loss: 0.473661
003370/008040, loss: 0.260944, avg_loss: 0.473336
003375/008040, loss: 0.292535, avg_loss: 0.473136
003380/008040, loss: 0.228900, avg_loss: 0.472876
003385/008040, loss: 0.133238, avg_loss: 0.472483
003390/008040, loss: 0.090823, avg_loss: 0.472257
003395/008040, loss: 0.353693, avg_loss: 0.471962
003400/008040, loss: 0.349422, avg_loss: 0.471682
003405/008040, loss: 0.257864, avg_loss: 0.471517
003410/008040, loss: 0.252595, avg_loss: 0.471166
003415/008040, loss: 0.247344, avg_loss: 0.470882
003420/008040, loss: 0.263462, avg_loss: 0.470580
003425/008040, loss: 0.343597, avg_loss: 0.470315
003430/008040, loss: 0.544077, avg_loss: 0.470094
003435/008040, loss: 0.303536, avg_loss: 0.469789
003440/008040, loss: 0.327594, avg_loss: 0.469431
003445/008040, loss: 0.325582, avg_loss: 0.469118
003450/008040, loss: 0.270005, avg_loss: 0.468807
003455/008040, loss: 0.373651, avg_loss: 0.468467
003460/008040, loss: 0.296829, avg_loss: 0.468144
003465/008040, loss: 0.382215, avg_loss: 0.467829
003470/008040, loss: 0.412389, avg_loss: 0.467540
003475/008040, loss: 0.434352, avg_loss: 0.467421
003480/008040, loss: 0.624913, avg_loss: 0.467201
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 12, step 3484/8040: {'matthews_correlation': 0.2118432448298745}
003485/008040, loss: 0.174334, avg_loss: 0.467056
003490/008040, loss: 0.165397, avg_loss: 0.466660
003495/008040, loss: 0.265744, avg_loss: 0.466368
003500/008040, loss: 0.194233, avg_loss: 0.466006
003505/008040, loss: 0.293150, avg_loss: 0.465843
003510/008040, loss: 0.190316, avg_loss: 0.465447
003515/008040, loss: 0.186641, avg_loss: 0.465075
003520/008040, loss: 0.197128, avg_loss: 0.464644
003525/008040, loss: 0.258738, avg_loss: 0.464389
003530/008040, loss: 0.500658, avg_loss: 0.464118
003535/008040, loss: 0.181811, avg_loss: 0.463812
003540/008040, loss: 0.255612, avg_loss: 0.463451
003545/008040, loss: 0.301980, avg_loss: 0.463198
003550/008040, loss: 0.262177, avg_loss: 0.462909
003555/008040, loss: 0.150805, avg_loss: 0.462686
003560/008040, loss: 0.106480, avg_loss: 0.462353
003565/008040, loss: 0.353977, avg_loss: 0.462018
003570/008040, loss: 0.095433, avg_loss: 0.461672
003575/008040, loss: 0.218111, avg_loss: 0.461353
003580/008040, loss: 0.149131, avg_loss: 0.461044
003585/008040, loss: 0.267326, avg_loss: 0.460682
003590/008040, loss: 0.132778, avg_loss: 0.460332
003595/008040, loss: 0.084128, avg_loss: 0.460019
003600/008040, loss: 0.240429, avg_loss: 0.459736
003605/008040, loss: 0.087250, avg_loss: 0.459388
003610/008040, loss: 0.350120, avg_loss: 0.459246
003615/008040, loss: 0.268835, avg_loss: 0.458947
003620/008040, loss: 0.269842, avg_loss: 0.458634
003625/008040, loss: 0.156989, avg_loss: 0.458435
003630/008040, loss: 0.230001, avg_loss: 0.458124
003635/008040, loss: 0.326509, avg_loss: 0.457910
003640/008040, loss: 0.336903, avg_loss: 0.457636
003645/008040, loss: 0.484366, avg_loss: 0.457448
003650/008040, loss: 0.292469, avg_loss: 0.457143
003655/008040, loss: 0.196029, avg_loss: 0.456941
003660/008040, loss: 0.388540, avg_loss: 0.456735
003665/008040, loss: 0.392700, avg_loss: 0.456447
003670/008040, loss: 0.214234, avg_loss: 0.456100
003675/008040, loss: 0.166740, avg_loss: 0.455776
003680/008040, loss: 0.476048, avg_loss: 0.455626
003685/008040, loss: 0.133647, avg_loss: 0.455296
003690/008040, loss: 0.138388, avg_loss: 0.454962
003695/008040, loss: 0.419241, avg_loss: 0.454699
003700/008040, loss: 0.273247, avg_loss: 0.454571
003705/008040, loss: 0.335091, avg_loss: 0.454264
003710/008040, loss: 0.352429, avg_loss: 0.454025
003715/008040, loss: 0.424709, avg_loss: 0.453788
003720/008040, loss: 0.169942, avg_loss: 0.453501
003725/008040, loss: 0.356818, avg_loss: 0.453249
003730/008040, loss: 0.165223, avg_loss: 0.452948
003735/008040, loss: 0.186675, avg_loss: 0.452718
003740/008040, loss: 0.381955, avg_loss: 0.452507
003745/008040, loss: 0.304955, avg_loss: 0.452266
003750/008040, loss: 0.405848, avg_loss: 0.452040
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 13, step 3752/8040: {'matthews_correlation': 0.20261239362380884}
003755/008040, loss: 0.216770, avg_loss: 0.451750
003760/008040, loss: 0.223490, avg_loss: 0.451394
003765/008040, loss: 0.375553, avg_loss: 0.451131
003770/008040, loss: 0.196744, avg_loss: 0.450761
003775/008040, loss: 0.363349, avg_loss: 0.450549
003780/008040, loss: 0.293711, avg_loss: 0.450257
003785/008040, loss: 0.413372, avg_loss: 0.450013
003790/008040, loss: 0.122226, avg_loss: 0.449670
003795/008040, loss: 0.356951, avg_loss: 0.449397
003800/008040, loss: 0.118402, avg_loss: 0.449095
003805/008040, loss: 0.146611, avg_loss: 0.448795
003810/008040, loss: 0.157524, avg_loss: 0.448482
003815/008040, loss: 0.343435, avg_loss: 0.448149
003820/008040, loss: 0.368691, avg_loss: 0.447827
003825/008040, loss: 0.201137, avg_loss: 0.447525
003830/008040, loss: 0.132014, avg_loss: 0.447167
003835/008040, loss: 0.090910, avg_loss: 0.446837
003840/008040, loss: 0.232675, avg_loss: 0.446527
003845/008040, loss: 0.275011, avg_loss: 0.446207
003850/008040, loss: 0.152451, avg_loss: 0.446016
003855/008040, loss: 0.314412, avg_loss: 0.445785
003860/008040, loss: 0.148032, avg_loss: 0.445503
003865/008040, loss: 0.165825, avg_loss: 0.445180
003870/008040, loss: 0.094333, avg_loss: 0.444829
003875/008040, loss: 0.375745, avg_loss: 0.444747
003880/008040, loss: 0.403074, avg_loss: 0.444519
003885/008040, loss: 0.165065, avg_loss: 0.444287
003890/008040, loss: 0.234658, avg_loss: 0.444039
003895/008040, loss: 0.319576, avg_loss: 0.443776
003900/008040, loss: 0.217123, avg_loss: 0.443509
003905/008040, loss: 0.378043, avg_loss: 0.443243
003910/008040, loss: 0.122735, avg_loss: 0.443042
003915/008040, loss: 0.264233, avg_loss: 0.442776
003920/008040, loss: 0.076704, avg_loss: 0.442449
003925/008040, loss: 0.125913, avg_loss: 0.442138
003930/008040, loss: 0.272993, avg_loss: 0.441843
003935/008040, loss: 0.068447, avg_loss: 0.441514
003940/008040, loss: 0.244199, avg_loss: 0.441180
003945/008040, loss: 0.228176, avg_loss: 0.440980
003950/008040, loss: 0.148637, avg_loss: 0.440699
003955/008040, loss: 0.299796, avg_loss: 0.440381
003960/008040, loss: 0.276640, avg_loss: 0.440127
003965/008040, loss: 0.344766, avg_loss: 0.439945
003970/008040, loss: 0.085695, avg_loss: 0.439692
003975/008040, loss: 0.467870, avg_loss: 0.439398
003980/008040, loss: 0.275529, avg_loss: 0.439101
003985/008040, loss: 0.342526, avg_loss: 0.438883
003990/008040, loss: 0.117043, avg_loss: 0.438652
003995/008040, loss: 0.216054, avg_loss: 0.438416
004000/008040, loss: 0.194812, avg_loss: 0.438143
004005/008040, loss: 0.173094, avg_loss: 0.437904
004010/008040, loss: 0.072183, avg_loss: 0.437564
004015/008040, loss: 0.403915, avg_loss: 0.437357
004020/008040, loss: 0.087834, avg_loss: 0.436997
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 14, step 4020/8040: {'matthews_correlation': 0.22518881045488998}
004025/008040, loss: 0.165382, avg_loss: 0.436778
004030/008040, loss: 0.179677, avg_loss: 0.436415
004035/008040, loss: 0.185260, avg_loss: 0.436160
004040/008040, loss: 0.095814, avg_loss: 0.435886
004045/008040, loss: 0.345136, avg_loss: 0.435673
004050/008040, loss: 0.150750, avg_loss: 0.435363
004055/008040, loss: 0.185758, avg_loss: 0.435133
004060/008040, loss: 0.212922, avg_loss: 0.434953
004065/008040, loss: 0.149902, avg_loss: 0.434669
004070/008040, loss: 0.089165, avg_loss: 0.434428
004075/008040, loss: 0.168942, avg_loss: 0.434088
004080/008040, loss: 0.170014, avg_loss: 0.433740
004085/008040, loss: 0.148718, avg_loss: 0.433445
004090/008040, loss: 0.307841, avg_loss: 0.433193
004095/008040, loss: 0.170424, avg_loss: 0.432888
004100/008040, loss: 0.253233, avg_loss: 0.432721
004105/008040, loss: 0.243379, avg_loss: 0.432492
004110/008040, loss: 0.197091, avg_loss: 0.432229
004115/008040, loss: 0.149977, avg_loss: 0.431969
004120/008040, loss: 0.255616, avg_loss: 0.431698
004125/008040, loss: 0.144500, avg_loss: 0.431442
004130/008040, loss: 0.214776, avg_loss: 0.431226
004135/008040, loss: 0.155176, avg_loss: 0.430925
004140/008040, loss: 0.195912, avg_loss: 0.430689
004145/008040, loss: 0.148231, avg_loss: 0.430308
004150/008040, loss: 0.309211, avg_loss: 0.430062
004155/008040, loss: 0.400494, avg_loss: 0.429949
004160/008040, loss: 0.190466, avg_loss: 0.429596
004165/008040, loss: 0.115031, avg_loss: 0.429430
004170/008040, loss: 0.154674, avg_loss: 0.429143
004175/008040, loss: 0.087612, avg_loss: 0.428819
004180/008040, loss: 0.391154, avg_loss: 0.428581
004185/008040, loss: 0.233200, avg_loss: 0.428438
004190/008040, loss: 0.113590, avg_loss: 0.428204
004195/008040, loss: 0.139859, avg_loss: 0.427997
004200/008040, loss: 0.365552, avg_loss: 0.427729
004205/008040, loss: 0.285945, avg_loss: 0.427567
004210/008040, loss: 0.298795, avg_loss: 0.427337
004215/008040, loss: 0.184676, avg_loss: 0.427015
004220/008040, loss: 0.347303, avg_loss: 0.426763
004225/008040, loss: 0.249475, avg_loss: 0.426473
004230/008040, loss: 0.345056, avg_loss: 0.426234
004235/008040, loss: 0.132455, avg_loss: 0.425935
004240/008040, loss: 0.083139, avg_loss: 0.425697
004245/008040, loss: 0.186649, avg_loss: 0.425451
004250/008040, loss: 0.159150, avg_loss: 0.425129
004255/008040, loss: 0.119297, avg_loss: 0.424885
004260/008040, loss: 0.233108, avg_loss: 0.424649
004265/008040, loss: 0.144114, avg_loss: 0.424408
004270/008040, loss: 0.210518, avg_loss: 0.424164
004275/008040, loss: 0.242731, avg_loss: 0.423926
004280/008040, loss: 0.157653, avg_loss: 0.423696
004285/008040, loss: 0.191035, avg_loss: 0.423368
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 15, step 4288/8040: {'matthews_correlation': 0.24863648291608131}
004290/008040, loss: 0.266971, avg_loss: 0.423141
004295/008040, loss: 0.322248, avg_loss: 0.422900
004300/008040, loss: 0.231828, avg_loss: 0.422700
004305/008040, loss: 0.072297, avg_loss: 0.422383
004310/008040, loss: 0.193845, avg_loss: 0.422057
004315/008040, loss: 0.217596, avg_loss: 0.421784
004320/008040, loss: 0.393519, avg_loss: 0.421559
004325/008040, loss: 0.400877, avg_loss: 0.421312
004330/008040, loss: 0.087280, avg_loss: 0.420988
004335/008040, loss: 0.124804, avg_loss: 0.420766
004340/008040, loss: 0.130229, avg_loss: 0.420461
004345/008040, loss: 0.304618, avg_loss: 0.420192
004350/008040, loss: 0.189475, avg_loss: 0.419871
004355/008040, loss: 0.203492, avg_loss: 0.419608
004360/008040, loss: 0.144623, avg_loss: 0.419307
004365/008040, loss: 0.127642, avg_loss: 0.418997
004370/008040, loss: 0.218811, avg_loss: 0.418663
004375/008040, loss: 0.059012, avg_loss: 0.418390
004380/008040, loss: 0.237954, avg_loss: 0.418244
004385/008040, loss: 0.248724, avg_loss: 0.418026
004390/008040, loss: 0.160703, avg_loss: 0.417783
004395/008040, loss: 0.276666, avg_loss: 0.417556
004400/008040, loss: 0.101335, avg_loss: 0.417345
004405/008040, loss: 0.468661, avg_loss: 0.417191
004410/008040, loss: 0.154267, avg_loss: 0.416879
004415/008040, loss: 0.196224, avg_loss: 0.416633
004420/008040, loss: 0.092791, avg_loss: 0.416405
004425/008040, loss: 0.447025, avg_loss: 0.416244
004430/008040, loss: 0.225542, avg_loss: 0.415959
004435/008040, loss: 0.091548, avg_loss: 0.415671
004440/008040, loss: 0.116130, avg_loss: 0.415333
004445/008040, loss: 0.225495, avg_loss: 0.415069
004450/008040, loss: 0.213666, avg_loss: 0.414828
004455/008040, loss: 0.130709, avg_loss: 0.414580
004460/008040, loss: 0.225696, avg_loss: 0.414366
004465/008040, loss: 0.322512, avg_loss: 0.414150
004470/008040, loss: 0.409171, avg_loss: 0.413917
004475/008040, loss: 0.459070, avg_loss: 0.413800
004480/008040, loss: 0.205403, avg_loss: 0.413537
004485/008040, loss: 0.097172, avg_loss: 0.413289
004490/008040, loss: 0.104971, avg_loss: 0.413039
004495/008040, loss: 0.269551, avg_loss: 0.412801
004500/008040, loss: 0.151229, avg_loss: 0.412506
004505/008040, loss: 0.137360, avg_loss: 0.412240
004510/008040, loss: 0.349339, avg_loss: 0.412084
004515/008040, loss: 0.074355, avg_loss: 0.411830
004520/008040, loss: 0.165137, avg_loss: 0.411674
004525/008040, loss: 0.120821, avg_loss: 0.411390
004530/008040, loss: 0.156756, avg_loss: 0.411117
004535/008040, loss: 0.131685, avg_loss: 0.410892
004540/008040, loss: 0.215486, avg_loss: 0.410740
004545/008040, loss: 0.276792, avg_loss: 0.410564
004550/008040, loss: 0.163451, avg_loss: 0.410302
004555/008040, loss: 0.153240, avg_loss: 0.410035
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 16, step 4556/8040: {'matthews_correlation': 0.19984853723708582}
004560/008040, loss: 0.091924, avg_loss: 0.409807
004565/008040, loss: 0.285278, avg_loss: 0.409558
004570/008040, loss: 0.153244, avg_loss: 0.409386
004575/008040, loss: 0.117640, avg_loss: 0.409192
004580/008040, loss: 0.196797, avg_loss: 0.408980
004585/008040, loss: 0.170434, avg_loss: 0.408724
004590/008040, loss: 0.291520, avg_loss: 0.408442
004595/008040, loss: 0.095928, avg_loss: 0.408124
004600/008040, loss: 0.133423, avg_loss: 0.407880
004605/008040, loss: 0.224401, avg_loss: 0.407606
004610/008040, loss: 0.244196, avg_loss: 0.407368
004615/008040, loss: 0.086107, avg_loss: 0.407023
004620/008040, loss: 0.088616, avg_loss: 0.406692
004625/008040, loss: 0.182435, avg_loss: 0.406431
004630/008040, loss: 0.108714, avg_loss: 0.406193
004635/008040, loss: 0.052255, avg_loss: 0.405910
004640/008040, loss: 0.201341, avg_loss: 0.405591
004645/008040, loss: 0.125202, avg_loss: 0.405348
004650/008040, loss: 0.262958, avg_loss: 0.405130
004655/008040, loss: 0.136696, avg_loss: 0.404868
004660/008040, loss: 0.232297, avg_loss: 0.404751
004665/008040, loss: 0.169257, avg_loss: 0.404498
004670/008040, loss: 0.086830, avg_loss: 0.404246
004675/008040, loss: 0.199137, avg_loss: 0.404007
004680/008040, loss: 0.169171, avg_loss: 0.403782
004685/008040, loss: 0.112735, avg_loss: 0.403494
004690/008040, loss: 0.240913, avg_loss: 0.403283
004695/008040, loss: 0.228971, avg_loss: 0.403096
004700/008040, loss: 0.203035, avg_loss: 0.402813
004705/008040, loss: 0.252691, avg_loss: 0.402542
004710/008040, loss: 0.183577, avg_loss: 0.402340
004715/008040, loss: 0.152381, avg_loss: 0.402066
004720/008040, loss: 0.081548, avg_loss: 0.401885
004725/008040, loss: 0.243844, avg_loss: 0.401712
004730/008040, loss: 0.191078, avg_loss: 0.401477
004735/008040, loss: 0.309967, avg_loss: 0.401239
004740/008040, loss: 0.177741, avg_loss: 0.400996
004745/008040, loss: 0.443657, avg_loss: 0.400854
004750/008040, loss: 0.174745, avg_loss: 0.400599
004755/008040, loss: 0.117440, avg_loss: 0.400411
004760/008040, loss: 0.216662, avg_loss: 0.400179
004765/008040, loss: 0.331156, avg_loss: 0.399990
004770/008040, loss: 0.239916, avg_loss: 0.399799
004775/008040, loss: 0.392543, avg_loss: 0.399598
004780/008040, loss: 0.310010, avg_loss: 0.399412
004785/008040, loss: 0.176596, avg_loss: 0.399203
004790/008040, loss: 0.193463, avg_loss: 0.399010
004795/008040, loss: 0.065539, avg_loss: 0.398731
004800/008040, loss: 0.078370, avg_loss: 0.398525
004805/008040, loss: 0.268404, avg_loss: 0.398395
004810/008040, loss: 0.130425, avg_loss: 0.398184
004815/008040, loss: 0.124595, avg_loss: 0.397953
004820/008040, loss: 0.351294, avg_loss: 0.397781
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 17, step 4824/8040: {'matthews_correlation': 0.23319244596326755}
004825/008040, loss: 0.117143, avg_loss: 0.397605
004830/008040, loss: 0.085320, avg_loss: 0.397402
004835/008040, loss: 0.329389, avg_loss: 0.397171
004840/008040, loss: 0.167244, avg_loss: 0.396923
004845/008040, loss: 0.084977, avg_loss: 0.396725
004850/008040, loss: 0.170633, avg_loss: 0.396479
004855/008040, loss: 0.162252, avg_loss: 0.396249
004860/008040, loss: 0.242330, avg_loss: 0.396023
004865/008040, loss: 0.158724, avg_loss: 0.395797
004870/008040, loss: 0.145546, avg_loss: 0.395552
004875/008040, loss: 0.154330, avg_loss: 0.395311
004880/008040, loss: 0.156234, avg_loss: 0.395089
004885/008040, loss: 0.104371, avg_loss: 0.394829
004890/008040, loss: 0.116719, avg_loss: 0.394561
004895/008040, loss: 0.167743, avg_loss: 0.394364
004900/008040, loss: 0.064857, avg_loss: 0.394083
004905/008040, loss: 0.067141, avg_loss: 0.393839
004910/008040, loss: 0.099388, avg_loss: 0.393620
004915/008040, loss: 0.121026, avg_loss: 0.393411
004920/008040, loss: 0.237852, avg_loss: 0.393219
004925/008040, loss: 0.116583, avg_loss: 0.392966
004930/008040, loss: 0.066189, avg_loss: 0.392755
004935/008040, loss: 0.100841, avg_loss: 0.392546
004940/008040, loss: 0.184811, avg_loss: 0.392326
004945/008040, loss: 0.261129, avg_loss: 0.392141
004950/008040, loss: 0.227229, avg_loss: 0.391920
004955/008040, loss: 0.073722, avg_loss: 0.391622
004960/008040, loss: 0.217448, avg_loss: 0.391421
004965/008040, loss: 0.166534, avg_loss: 0.391247
004970/008040, loss: 0.169978, avg_loss: 0.391054
004975/008040, loss: 0.162336, avg_loss: 0.390803
004980/008040, loss: 0.100707, avg_loss: 0.390613
004985/008040, loss: 0.066454, avg_loss: 0.390380
004990/008040, loss: 0.268293, avg_loss: 0.390155
004995/008040, loss: 0.299800, avg_loss: 0.389940
005000/008040, loss: 0.159697, avg_loss: 0.389720
005005/008040, loss: 0.089164, avg_loss: 0.389519
005010/008040, loss: 0.083933, avg_loss: 0.389239
005015/008040, loss: 0.109845, avg_loss: 0.389072
005020/008040, loss: 0.323453, avg_loss: 0.388975
005025/008040, loss: 0.218569, avg_loss: 0.388836
005030/008040, loss: 0.306596, avg_loss: 0.388684
005035/008040, loss: 0.214361, avg_loss: 0.388518
005040/008040, loss: 0.203883, avg_loss: 0.388300
005045/008040, loss: 0.119648, avg_loss: 0.388050
005050/008040, loss: 0.241945, avg_loss: 0.387813
005055/008040, loss: 0.295856, avg_loss: 0.387694
005060/008040, loss: 0.299737, avg_loss: 0.387502
005065/008040, loss: 0.173353, avg_loss: 0.387261
005070/008040, loss: 0.148706, avg_loss: 0.387023
005075/008040, loss: 0.235021, avg_loss: 0.386797
005080/008040, loss: 0.309368, avg_loss: 0.386680
005085/008040, loss: 0.187352, avg_loss: 0.386492
005090/008040, loss: 0.124517, avg_loss: 0.386330
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 18, step 5092/8040: {'matthews_correlation': 0.24348660475263997}
005095/008040, loss: 0.244682, avg_loss: 0.386098
005100/008040, loss: 0.038414, avg_loss: 0.385838
005105/008040, loss: 0.095592, avg_loss: 0.385585
005110/008040, loss: 0.112026, avg_loss: 0.385404
005115/008040, loss: 0.193563, avg_loss: 0.385160
005120/008040, loss: 0.169361, avg_loss: 0.384950
005125/008040, loss: 0.115310, avg_loss: 0.384794
005130/008040, loss: 0.111171, avg_loss: 0.384595
005135/008040, loss: 0.347275, avg_loss: 0.384407
005140/008040, loss: 0.066092, avg_loss: 0.384158
005145/008040, loss: 0.038691, avg_loss: 0.383875
005150/008040, loss: 0.187798, avg_loss: 0.383690
005155/008040, loss: 0.080341, avg_loss: 0.383389
005160/008040, loss: 0.250113, avg_loss: 0.383158
005165/008040, loss: 0.230404, avg_loss: 0.382940
005170/008040, loss: 0.199019, avg_loss: 0.382771
005175/008040, loss: 0.042526, avg_loss: 0.382549
005180/008040, loss: 0.107391, avg_loss: 0.382405
005185/008040, loss: 0.123089, avg_loss: 0.382158
005190/008040, loss: 0.211129, avg_loss: 0.381957
005195/008040, loss: 0.191329, avg_loss: 0.381727
005200/008040, loss: 0.247005, avg_loss: 0.381528
005205/008040, loss: 0.151045, avg_loss: 0.381342
005210/008040, loss: 0.221767, avg_loss: 0.381150
005215/008040, loss: 0.098915, avg_loss: 0.380916
005220/008040, loss: 0.120604, avg_loss: 0.380805
005225/008040, loss: 0.198758, avg_loss: 0.380580
005230/008040, loss: 0.170238, avg_loss: 0.380375
005235/008040, loss: 0.287471, avg_loss: 0.380185
005240/008040, loss: 0.099829, avg_loss: 0.379951
005245/008040, loss: 0.192130, avg_loss: 0.379746
005250/008040, loss: 0.174886, avg_loss: 0.379625
005255/008040, loss: 0.154950, avg_loss: 0.379463
005260/008040, loss: 0.315916, avg_loss: 0.379261
005265/008040, loss: 0.142192, avg_loss: 0.379091
005270/008040, loss: 0.157938, avg_loss: 0.378953
005275/008040, loss: 0.112631, avg_loss: 0.378706
005280/008040, loss: 0.112058, avg_loss: 0.378528
005285/008040, loss: 0.109890, avg_loss: 0.378278
005290/008040, loss: 0.096205, avg_loss: 0.378069
005295/008040, loss: 0.101920, avg_loss: 0.377793
005300/008040, loss: 0.223294, avg_loss: 0.377597
005305/008040, loss: 0.334869, avg_loss: 0.377448
005310/008040, loss: 0.133110, avg_loss: 0.377219
005315/008040, loss: 0.285188, avg_loss: 0.377080
005320/008040, loss: 0.093014, avg_loss: 0.376865
005325/008040, loss: 0.064642, avg_loss: 0.376663
005330/008040, loss: 0.399625, avg_loss: 0.376612
005335/008040, loss: 0.099368, avg_loss: 0.376419
005340/008040, loss: 0.127971, avg_loss: 0.376192
005345/008040, loss: 0.276726, avg_loss: 0.376028
005350/008040, loss: 0.203088, avg_loss: 0.375818
005355/008040, loss: 0.162861, avg_loss: 0.375639
005360/008040, loss: 0.111333, avg_loss: 0.375443
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 19, step 5360/8040: {'matthews_correlation': 0.2545245288314363}
005365/008040, loss: 0.182490, avg_loss: 0.375250
005370/008040, loss: 0.306611, avg_loss: 0.375098
005375/008040, loss: 0.086688, avg_loss: 0.374876
005380/008040, loss: 0.073351, avg_loss: 0.374639
005385/008040, loss: 0.076141, avg_loss: 0.374468
005390/008040, loss: 0.263192, avg_loss: 0.374279
005395/008040, loss: 0.057974, avg_loss: 0.374064
005400/008040, loss: 0.168771, avg_loss: 0.373868
005405/008040, loss: 0.101774, avg_loss: 0.373638
005410/008040, loss: 0.233914, avg_loss: 0.373431
005415/008040, loss: 0.199603, avg_loss: 0.373236
005420/008040, loss: 0.069979, avg_loss: 0.373044
005425/008040, loss: 0.157929, avg_loss: 0.372784
005430/008040, loss: 0.102463, avg_loss: 0.372559
005435/008040, loss: 0.311656, avg_loss: 0.372387
005440/008040, loss: 0.025280, avg_loss: 0.372177
005445/008040, loss: 0.199722, avg_loss: 0.372008
005450/008040, loss: 0.037342, avg_loss: 0.371794
005455/008040, loss: 0.464360, avg_loss: 0.371634
005460/008040, loss: 0.151161, avg_loss: 0.371387
005465/008040, loss: 0.131248, avg_loss: 0.371174
005470/008040, loss: 0.091763, avg_loss: 0.370973
005475/008040, loss: 0.124437, avg_loss: 0.370771
005480/008040, loss: 0.056806, avg_loss: 0.370576
005485/008040, loss: 0.053934, avg_loss: 0.370370
005490/008040, loss: 0.134340, avg_loss: 0.370134
005495/008040, loss: 0.403093, avg_loss: 0.369976
005500/008040, loss: 0.295253, avg_loss: 0.369797
005505/008040, loss: 0.123554, avg_loss: 0.369599
005510/008040, loss: 0.146412, avg_loss: 0.369405
005515/008040, loss: 0.086848, avg_loss: 0.369164
005520/008040, loss: 0.190395, avg_loss: 0.368969
005525/008040, loss: 0.214298, avg_loss: 0.368869
005530/008040, loss: 0.157094, avg_loss: 0.368707
005535/008040, loss: 0.236498, avg_loss: 0.368588
005540/008040, loss: 0.150522, avg_loss: 0.368397
005545/008040, loss: 0.056312, avg_loss: 0.368192
005550/008040, loss: 0.132276, avg_loss: 0.367994
005555/008040, loss: 0.152209, avg_loss: 0.367803
005560/008040, loss: 0.123136, avg_loss: 0.367632
005565/008040, loss: 0.295406, avg_loss: 0.367456
005570/008040, loss: 0.163695, avg_loss: 0.367255
005575/008040, loss: 0.032764, avg_loss: 0.367021
005580/008040, loss: 0.077804, avg_loss: 0.366803
005585/008040, loss: 0.426609, avg_loss: 0.366718
005590/008040, loss: 0.170544, avg_loss: 0.366554
005595/008040, loss: 0.121247, avg_loss: 0.366328
005600/008040, loss: 0.118504, avg_loss: 0.366139
005605/008040, loss: 0.127036, avg_loss: 0.365945
005610/008040, loss: 0.253191, avg_loss: 0.365772
005615/008040, loss: 0.132579, avg_loss: 0.365584
005620/008040, loss: 0.206162, avg_loss: 0.365378
005625/008040, loss: 0.138357, avg_loss: 0.365198
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 20, step 5628/8040: {'matthews_correlation': 0.20994533418798944}
005630/008040, loss: 0.220501, avg_loss: 0.364957
005635/008040, loss: 0.176737, avg_loss: 0.364715
005640/008040, loss: 0.150695, avg_loss: 0.364518
005645/008040, loss: 0.186189, avg_loss: 0.364357
005650/008040, loss: 0.238791, avg_loss: 0.364176
005655/008040, loss: 0.128307, avg_loss: 0.363933
005660/008040, loss: 0.098545, avg_loss: 0.363732
005665/008040, loss: 0.059385, avg_loss: 0.363478
005670/008040, loss: 0.106437, avg_loss: 0.363278
005675/008040, loss: 0.051390, avg_loss: 0.363102
005680/008040, loss: 0.264690, avg_loss: 0.362942
005685/008040, loss: 0.051885, avg_loss: 0.362837
005690/008040, loss: 0.186132, avg_loss: 0.362659
005695/008040, loss: 0.110282, avg_loss: 0.362424
005700/008040, loss: 0.035696, avg_loss: 0.362194
005705/008040, loss: 0.210790, avg_loss: 0.362002
005710/008040, loss: 0.176916, avg_loss: 0.361813
005715/008040, loss: 0.068533, avg_loss: 0.361627
005720/008040, loss: 0.059564, avg_loss: 0.361417
005725/008040, loss: 0.087551, avg_loss: 0.361274
005730/008040, loss: 0.166153, avg_loss: 0.361102
005735/008040, loss: 0.123037, avg_loss: 0.360934
005740/008040, loss: 0.175274, avg_loss: 0.360734
005745/008040, loss: 0.053053, avg_loss: 0.360546
005750/008040, loss: 0.193917, avg_loss: 0.360384
005755/008040, loss: 0.192751, avg_loss: 0.360214
005760/008040, loss: 0.101878, avg_loss: 0.360026
005765/008040, loss: 0.085134, avg_loss: 0.359832
005770/008040, loss: 0.293009, avg_loss: 0.359642
005775/008040, loss: 0.104310, avg_loss: 0.359413
005780/008040, loss: 0.011819, avg_loss: 0.359168
005785/008040, loss: 0.210015, avg_loss: 0.359009
005790/008040, loss: 0.200174, avg_loss: 0.358879
005795/008040, loss: 0.141055, avg_loss: 0.358671
005800/008040, loss: 0.078129, avg_loss: 0.358469
005805/008040, loss: 0.083557, avg_loss: 0.358299
005810/008040, loss: 0.075039, avg_loss: 0.358079
005815/008040, loss: 0.080684, avg_loss: 0.357903
005820/008040, loss: 0.345792, avg_loss: 0.357738
005825/008040, loss: 0.078370, avg_loss: 0.357590
005830/008040, loss: 0.199539, avg_loss: 0.357436
005835/008040, loss: 0.196801, avg_loss: 0.357251
005840/008040, loss: 0.173617, avg_loss: 0.357071
005845/008040, loss: 0.056907, avg_loss: 0.356874
005850/008040, loss: 0.165107, avg_loss: 0.356683
005855/008040, loss: 0.100072, avg_loss: 0.356521
005860/008040, loss: 0.178491, avg_loss: 0.356324
005865/008040, loss: 0.212101, avg_loss: 0.356138
005870/008040, loss: 0.215021, avg_loss: 0.355963
005875/008040, loss: 0.273816, avg_loss: 0.355788
005880/008040, loss: 0.364194, avg_loss: 0.355641
005885/008040, loss: 0.270123, avg_loss: 0.355498
005890/008040, loss: 0.047443, avg_loss: 0.355291
005895/008040, loss: 0.142198, avg_loss: 0.355145
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 21, step 5896/8040: {'matthews_correlation': 0.21551745055261307}
005900/008040, loss: 0.191457, avg_loss: 0.354973
005905/008040, loss: 0.295734, avg_loss: 0.354803
005910/008040, loss: 0.075735, avg_loss: 0.354682
005915/008040, loss: 0.142483, avg_loss: 0.354498
005920/008040, loss: 0.117506, avg_loss: 0.354280
005925/008040, loss: 0.108497, avg_loss: 0.354039
005930/008040, loss: 0.023560, avg_loss: 0.353889
005935/008040, loss: 0.051859, avg_loss: 0.353655
005940/008040, loss: 0.096430, avg_loss: 0.353502
005945/008040, loss: 0.168284, avg_loss: 0.353288
005950/008040, loss: 0.137047, avg_loss: 0.353141
005955/008040, loss: 0.182130, avg_loss: 0.353003
005960/008040, loss: 0.052544, avg_loss: 0.352779
005965/008040, loss: 0.148201, avg_loss: 0.352614
005970/008040, loss: 0.203060, avg_loss: 0.352449
005975/008040, loss: 0.152961, avg_loss: 0.352309
005980/008040, loss: 0.149886, avg_loss: 0.352082
005985/008040, loss: 0.108204, avg_loss: 0.351957
005990/008040, loss: 0.102725, avg_loss: 0.351766
005995/008040, loss: 0.023260, avg_loss: 0.351590
006000/008040, loss: 0.115315, avg_loss: 0.351441
006005/008040, loss: 0.074605, avg_loss: 0.351242
006010/008040, loss: 0.142932, avg_loss: 0.351052
006015/008040, loss: 0.083695, avg_loss: 0.350857
006020/008040, loss: 0.043695, avg_loss: 0.350694
006025/008040, loss: 0.099229, avg_loss: 0.350506
006030/008040, loss: 0.024634, avg_loss: 0.350314
006035/008040, loss: 0.213198, avg_loss: 0.350172
006040/008040, loss: 0.090062, avg_loss: 0.349989
006045/008040, loss: 0.165650, avg_loss: 0.349814
006050/008040, loss: 0.375118, avg_loss: 0.349689
006055/008040, loss: 0.092574, avg_loss: 0.349502
006060/008040, loss: 0.076881, avg_loss: 0.349333
006065/008040, loss: 0.215473, avg_loss: 0.349175
006070/008040, loss: 0.047726, avg_loss: 0.349023
006075/008040, loss: 0.275449, avg_loss: 0.348853
006080/008040, loss: 0.091764, avg_loss: 0.348649
006085/008040, loss: 0.158617, avg_loss: 0.348518
006090/008040, loss: 0.398433, avg_loss: 0.348350
006095/008040, loss: 0.249465, avg_loss: 0.348170
006100/008040, loss: 0.230916, avg_loss: 0.348021
006105/008040, loss: 0.138895, avg_loss: 0.347855
006110/008040, loss: 0.023905, avg_loss: 0.347659
006115/008040, loss: 0.183222, avg_loss: 0.347486
006120/008040, loss: 0.149845, avg_loss: 0.347367
006125/008040, loss: 0.120646, avg_loss: 0.347237
006130/008040, loss: 0.232747, avg_loss: 0.347078
006135/008040, loss: 0.086326, avg_loss: 0.346889
006140/008040, loss: 0.044021, avg_loss: 0.346692
006145/008040, loss: 0.173458, avg_loss: 0.346579
006150/008040, loss: 0.110168, avg_loss: 0.346419
006155/008040, loss: 0.041496, avg_loss: 0.346239
006160/008040, loss: 0.048964, avg_loss: 0.346022
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 22, step 6164/8040: {'matthews_correlation': 0.20483291444361929}
006165/008040, loss: 0.256147, avg_loss: 0.345885
006170/008040, loss: 0.100646, avg_loss: 0.345729
006175/008040, loss: 0.135744, avg_loss: 0.345560
006180/008040, loss: 0.070830, avg_loss: 0.345336
006185/008040, loss: 0.183400, avg_loss: 0.345210
006190/008040, loss: 0.171377, avg_loss: 0.345125
006195/008040, loss: 0.104681, avg_loss: 0.344985
006200/008040, loss: 0.047664, avg_loss: 0.344778
006205/008040, loss: 0.132229, avg_loss: 0.344638
006210/008040, loss: 0.212232, avg_loss: 0.344449
006215/008040, loss: 0.037690, avg_loss: 0.344257
006220/008040, loss: 0.265332, avg_loss: 0.344097
006225/008040, loss: 0.114738, avg_loss: 0.343896
006230/008040, loss: 0.075357, avg_loss: 0.343712
006235/008040, loss: 0.082196, avg_loss: 0.343519
006240/008040, loss: 0.141044, avg_loss: 0.343342
006245/008040, loss: 0.061539, avg_loss: 0.343173
006250/008040, loss: 0.156940, avg_loss: 0.342981
006255/008040, loss: 0.074917, avg_loss: 0.342844
006260/008040, loss: 0.182297, avg_loss: 0.342644
006265/008040, loss: 0.188166, avg_loss: 0.342492
006270/008040, loss: 0.098817, avg_loss: 0.342310
006275/008040, loss: 0.026582, avg_loss: 0.342126
006280/008040, loss: 0.093155, avg_loss: 0.341935
006285/008040, loss: 0.121849, avg_loss: 0.341775
006290/008040, loss: 0.117257, avg_loss: 0.341654
006295/008040, loss: 0.470718, avg_loss: 0.341518
006300/008040, loss: 0.071532, avg_loss: 0.341330
006305/008040, loss: 0.077978, avg_loss: 0.341138
006310/008040, loss: 0.280971, avg_loss: 0.341040
006315/008040, loss: 0.083832, avg_loss: 0.340839
006320/008040, loss: 0.123453, avg_loss: 0.340651
006325/008040, loss: 0.293229, avg_loss: 0.340510
006330/008040, loss: 0.010351, avg_loss: 0.340313
006335/008040, loss: 0.206183, avg_loss: 0.340134
006340/008040, loss: 0.157802, avg_loss: 0.340008
006345/008040, loss: 0.045268, avg_loss: 0.339836
006350/008040, loss: 0.096029, avg_loss: 0.339639
006355/008040, loss: 0.094634, avg_loss: 0.339505
006360/008040, loss: 0.020495, avg_loss: 0.339314
006365/008040, loss: 0.099276, avg_loss: 0.339191
006370/008040, loss: 0.040780, avg_loss: 0.339016
006375/008040, loss: 0.107320, avg_loss: 0.338862
006380/008040, loss: 0.212123, avg_loss: 0.338757
006385/008040, loss: 0.228286, avg_loss: 0.338587
006390/008040, loss: 0.094378, avg_loss: 0.338381
006395/008040, loss: 0.094123, avg_loss: 0.338215
006400/008040, loss: 0.153880, avg_loss: 0.338045
006405/008040, loss: 0.025904, avg_loss: 0.337846
006410/008040, loss: 0.082967, avg_loss: 0.337646
006415/008040, loss: 0.040689, avg_loss: 0.337446
006420/008040, loss: 0.075779, avg_loss: 0.337272
006425/008040, loss: 0.150025, avg_loss: 0.337107
006430/008040, loss: 0.141630, avg_loss: 0.336983
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 23, step 6432/8040: {'matthews_correlation': 0.24931944187781385}
006435/008040, loss: 0.153386, avg_loss: 0.336812
006440/008040, loss: 0.222147, avg_loss: 0.336705
006445/008040, loss: 0.156677, avg_loss: 0.336554
006450/008040, loss: 0.010944, avg_loss: 0.336408
006455/008040, loss: 0.038571, avg_loss: 0.336201
006460/008040, loss: 0.114449, avg_loss: 0.336032
006465/008040, loss: 0.089689, avg_loss: 0.335848
006470/008040, loss: 0.329702, avg_loss: 0.335703
006475/008040, loss: 0.234976, avg_loss: 0.335533
006480/008040, loss: 0.090094, avg_loss: 0.335370
006485/008040, loss: 0.090959, avg_loss: 0.335211
006490/008040, loss: 0.184350, avg_loss: 0.335043
006495/008040, loss: 0.013678, avg_loss: 0.334875
006500/008040, loss: 0.066592, avg_loss: 0.334706
006505/008040, loss: 0.044437, avg_loss: 0.334594
006510/008040, loss: 0.044384, avg_loss: 0.334395
006515/008040, loss: 0.141515, avg_loss: 0.334245
006520/008040, loss: 0.147063, avg_loss: 0.334082
006525/008040, loss: 0.132490, avg_loss: 0.333948
006530/008040, loss: 0.115439, avg_loss: 0.333813
006535/008040, loss: 0.072417, avg_loss: 0.333611
006540/008040, loss: 0.122740, avg_loss: 0.333438
006545/008040, loss: 0.153896, avg_loss: 0.333265
006550/008040, loss: 0.216064, avg_loss: 0.333112
006555/008040, loss: 0.074902, avg_loss: 0.332915
006560/008040, loss: 0.229158, avg_loss: 0.332779
006565/008040, loss: 0.142630, avg_loss: 0.332649
006570/008040, loss: 0.106839, avg_loss: 0.332540
006575/008040, loss: 0.016569, avg_loss: 0.332368
006580/008040, loss: 0.110829, avg_loss: 0.332200
006585/008040, loss: 0.072897, avg_loss: 0.332014
006590/008040, loss: 0.148401, avg_loss: 0.331831
006595/008040, loss: 0.472177, avg_loss: 0.331741
006600/008040, loss: 0.026618, avg_loss: 0.331582
006605/008040, loss: 0.192052, avg_loss: 0.331442
006610/008040, loss: 0.176737, avg_loss: 0.331313
006615/008040, loss: 0.256435, avg_loss: 0.331163
006620/008040, loss: 0.112773, avg_loss: 0.330998
006625/008040, loss: 0.168097, avg_loss: 0.330900
006630/008040, loss: 0.149819, avg_loss: 0.330756
006635/008040, loss: 0.035909, avg_loss: 0.330597
006640/008040, loss: 0.116389, avg_loss: 0.330473
006645/008040, loss: 0.067924, avg_loss: 0.330317
006650/008040, loss: 0.029742, avg_loss: 0.330124
006655/008040, loss: 0.111944, avg_loss: 0.329962
006660/008040, loss: 0.138587, avg_loss: 0.329805
006665/008040, loss: 0.214782, avg_loss: 0.329670
006670/008040, loss: 0.134468, avg_loss: 0.329520
006675/008040, loss: 0.131746, avg_loss: 0.329344
006680/008040, loss: 0.085801, avg_loss: 0.329181
006685/008040, loss: 0.123189, avg_loss: 0.329020
006690/008040, loss: 0.059361, avg_loss: 0.328905
006695/008040, loss: 0.131232, avg_loss: 0.328752
006700/008040, loss: 0.036765, avg_loss: 0.328543
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 24, step 6700/8040: {'matthews_correlation': 0.23227684406858393}
006705/008040, loss: 0.030755, avg_loss: 0.328399
006710/008040, loss: 0.099883, avg_loss: 0.328219
006715/008040, loss: 0.177528, avg_loss: 0.328066
006720/008040, loss: 0.272356, avg_loss: 0.327913
006725/008040, loss: 0.024312, avg_loss: 0.327721
006730/008040, loss: 0.085378, avg_loss: 0.327553
006735/008040, loss: 0.042832, avg_loss: 0.327387
006740/008040, loss: 0.151696, avg_loss: 0.327222
006745/008040, loss: 0.125671, avg_loss: 0.327072
006750/008040, loss: 0.065977, avg_loss: 0.326951
006755/008040, loss: 0.120872, avg_loss: 0.326772
006760/008040, loss: 0.024726, avg_loss: 0.326601
006765/008040, loss: 0.058831, avg_loss: 0.326394
006770/008040, loss: 0.360663, avg_loss: 0.326308
006775/008040, loss: 0.055613, avg_loss: 0.326138
006780/008040, loss: 0.068713, avg_loss: 0.325965
006785/008040, loss: 0.048198, avg_loss: 0.325800
006790/008040, loss: 0.176144, avg_loss: 0.325608
006795/008040, loss: 0.084840, avg_loss: 0.325489
006800/008040, loss: 0.252102, avg_loss: 0.325354
006805/008040, loss: 0.021211, avg_loss: 0.325169
006810/008040, loss: 0.206506, avg_loss: 0.325002
006815/008040, loss: 0.049258, avg_loss: 0.324803
006820/008040, loss: 0.062864, avg_loss: 0.324719
006825/008040, loss: 0.211644, avg_loss: 0.324574
006830/008040, loss: 0.244467, avg_loss: 0.324422
006835/008040, loss: 0.125059, avg_loss: 0.324291
006840/008040, loss: 0.126497, avg_loss: 0.324127
006845/008040, loss: 0.195687, avg_loss: 0.323964
006850/008040, loss: 0.020408, avg_loss: 0.323804
006855/008040, loss: 0.039552, avg_loss: 0.323659
006860/008040, loss: 0.053010, avg_loss: 0.323502
006865/008040, loss: 0.240591, avg_loss: 0.323372
006870/008040, loss: 0.067442, avg_loss: 0.323200
006875/008040, loss: 0.070645, avg_loss: 0.323066
006880/008040, loss: 0.043463, avg_loss: 0.322892
006885/008040, loss: 0.065176, avg_loss: 0.322750
006890/008040, loss: 0.022665, avg_loss: 0.322594
006895/008040, loss: 0.024392, avg_loss: 0.322466
006900/008040, loss: 0.065461, avg_loss: 0.322293
006905/008040, loss: 0.137459, avg_loss: 0.322161
006910/008040, loss: 0.183977, avg_loss: 0.322088
006915/008040, loss: 0.082571, avg_loss: 0.321932
006920/008040, loss: 0.047769, avg_loss: 0.321773
006925/008040, loss: 0.185715, avg_loss: 0.321622
006930/008040, loss: 0.120999, avg_loss: 0.321527
006935/008040, loss: 0.011521, avg_loss: 0.321368
006940/008040, loss: 0.056693, avg_loss: 0.321238
006945/008040, loss: 0.312799, avg_loss: 0.321111
006950/008040, loss: 0.093848, avg_loss: 0.320974
006955/008040, loss: 0.147207, avg_loss: 0.320829
006960/008040, loss: 0.244541, avg_loss: 0.320667
006965/008040, loss: 0.171516, avg_loss: 0.320572
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 25, step 6968/8040: {'matthews_correlation': 0.2203939727085643}
006970/008040, loss: 0.090653, avg_loss: 0.320431
006975/008040, loss: 0.113621, avg_loss: 0.320278
006980/008040, loss: 0.050388, avg_loss: 0.320129
006985/008040, loss: 0.195083, avg_loss: 0.319979
006990/008040, loss: 0.071205, avg_loss: 0.319799
006995/008040, loss: 0.020149, avg_loss: 0.319621
007000/008040, loss: 0.059265, avg_loss: 0.319458
007005/008040, loss: 0.179539, avg_loss: 0.319284
007010/008040, loss: 0.012788, avg_loss: 0.319089
007015/008040, loss: 0.034613, avg_loss: 0.318934
007020/008040, loss: 0.168260, avg_loss: 0.318806
007025/008040, loss: 0.188078, avg_loss: 0.318649
007030/008040, loss: 0.011455, avg_loss: 0.318545
007035/008040, loss: 0.040954, avg_loss: 0.318369
007040/008040, loss: 0.096969, avg_loss: 0.318195
007045/008040, loss: 0.071101, avg_loss: 0.318037
007050/008040, loss: 0.081423, avg_loss: 0.317851
007055/008040, loss: 0.195587, avg_loss: 0.317688
007060/008040, loss: 0.032233, avg_loss: 0.317557
007065/008040, loss: 0.271995, avg_loss: 0.317422
007070/008040, loss: 0.037814, avg_loss: 0.317280
007075/008040, loss: 0.068158, avg_loss: 0.317114
007080/008040, loss: 0.254120, avg_loss: 0.317008
007085/008040, loss: 0.078963, avg_loss: 0.316860
007090/008040, loss: 0.024606, avg_loss: 0.316704
007095/008040, loss: 0.112365, avg_loss: 0.316543
007100/008040, loss: 0.068048, avg_loss: 0.316381
007105/008040, loss: 0.029112, avg_loss: 0.316234
007110/008040, loss: 0.055819, avg_loss: 0.316104
007115/008040, loss: 0.025860, avg_loss: 0.315943
007120/008040, loss: 0.094708, avg_loss: 0.315802
007125/008040, loss: 0.087746, avg_loss: 0.315670
007130/008040, loss: 0.134385, avg_loss: 0.315513
007135/008040, loss: 0.135339, avg_loss: 0.315398
007140/008040, loss: 0.315180, avg_loss: 0.315259
007145/008040, loss: 0.054737, avg_loss: 0.315112
007150/008040, loss: 0.405788, avg_loss: 0.315005
007155/008040, loss: 0.188528, avg_loss: 0.314918
007160/008040, loss: 0.061403, avg_loss: 0.314754
007165/008040, loss: 0.077819, avg_loss: 0.314588
007170/008040, loss: 0.136640, avg_loss: 0.314467
007175/008040, loss: 0.055474, avg_loss: 0.314311
007180/008040, loss: 0.023272, avg_loss: 0.314152
007185/008040, loss: 0.098981, avg_loss: 0.314002
007190/008040, loss: 0.019560, avg_loss: 0.313822
007195/008040, loss: 0.348302, avg_loss: 0.313728
007200/008040, loss: 0.105960, avg_loss: 0.313588
007205/008040, loss: 0.246406, avg_loss: 0.313467
007210/008040, loss: 0.074683, avg_loss: 0.313337
007215/008040, loss: 0.291595, avg_loss: 0.313251
007220/008040, loss: 0.034121, avg_loss: 0.313123
007225/008040, loss: 0.074492, avg_loss: 0.313020
007230/008040, loss: 0.108867, avg_loss: 0.312875
007235/008040, loss: 0.158608, avg_loss: 0.312725
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 26, step 7236/8040: {'matthews_correlation': 0.23696373689939254}
007240/008040, loss: 0.150083, avg_loss: 0.312579
007245/008040, loss: 0.122167, avg_loss: 0.312434
007250/008040, loss: 0.083941, avg_loss: 0.312296
007255/008040, loss: 0.191218, avg_loss: 0.312159
007260/008040, loss: 0.092216, avg_loss: 0.312034
007265/008040, loss: 0.081775, avg_loss: 0.311902
007270/008040, loss: 0.059207, avg_loss: 0.311740
007275/008040, loss: 0.273339, avg_loss: 0.311638
007280/008040, loss: 0.271834, avg_loss: 0.311530
007285/008040, loss: 0.179015, avg_loss: 0.311387
007290/008040, loss: 0.046599, avg_loss: 0.311231
007295/008040, loss: 0.051559, avg_loss: 0.311052
007300/008040, loss: 0.105356, avg_loss: 0.310882
007305/008040, loss: 0.043740, avg_loss: 0.310739
007310/008040, loss: 0.080959, avg_loss: 0.310613
007315/008040, loss: 0.058263, avg_loss: 0.310452
007320/008040, loss: 0.167761, avg_loss: 0.310307
007325/008040, loss: 0.128561, avg_loss: 0.310206
007330/008040, loss: 0.103146, avg_loss: 0.310052
007335/008040, loss: 0.219730, avg_loss: 0.309914
007340/008040, loss: 0.055324, avg_loss: 0.309757
007345/008040, loss: 0.057465, avg_loss: 0.309610
007350/008040, loss: 0.402242, avg_loss: 0.309490
007355/008040, loss: 0.352928, avg_loss: 0.309368
007360/008040, loss: 0.270440, avg_loss: 0.309237
007365/008040, loss: 0.094341, avg_loss: 0.309134
007370/008040, loss: 0.325051, avg_loss: 0.309029
007375/008040, loss: 0.040529, avg_loss: 0.308867
007380/008040, loss: 0.090125, avg_loss: 0.308736
007385/008040, loss: 0.053935, avg_loss: 0.308590
007390/008040, loss: 0.430134, avg_loss: 0.308482
007395/008040, loss: 0.147528, avg_loss: 0.308348
007400/008040, loss: 0.121706, avg_loss: 0.308195
007405/008040, loss: 0.178868, avg_loss: 0.308087
007410/008040, loss: 0.178170, avg_loss: 0.307974
007415/008040, loss: 0.023204, avg_loss: 0.307802
007420/008040, loss: 0.182678, avg_loss: 0.307687
007425/008040, loss: 0.090694, avg_loss: 0.307537
007430/008040, loss: 0.028491, avg_loss: 0.307386
007435/008040, loss: 0.027389, avg_loss: 0.307237
007440/008040, loss: 0.283375, avg_loss: 0.307118
007445/008040, loss: 0.036991, avg_loss: 0.306987
007450/008040, loss: 0.103909, avg_loss: 0.306909
007455/008040, loss: 0.036829, avg_loss: 0.306770
007460/008040, loss: 0.052082, avg_loss: 0.306626
007465/008040, loss: 0.338257, avg_loss: 0.306543
007470/008040, loss: 0.037553, avg_loss: 0.306415
007475/008040, loss: 0.031671, avg_loss: 0.306279
007480/008040, loss: 0.039051, avg_loss: 0.306135
007485/008040, loss: 0.096327, avg_loss: 0.305981
007490/008040, loss: 0.198999, avg_loss: 0.305869
007495/008040, loss: 0.182575, avg_loss: 0.305730
007500/008040, loss: 0.257397, avg_loss: 0.305617
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 27, step 7504/8040: {'matthews_correlation': 0.22894062387495076}
007505/008040, loss: 0.123742, avg_loss: 0.305505
007510/008040, loss: 0.061330, avg_loss: 0.305378
007515/008040, loss: 0.016107, avg_loss: 0.305229
007520/008040, loss: 0.035412, avg_loss: 0.305081
007525/008040, loss: 0.043585, avg_loss: 0.304929
007530/008040, loss: 0.019222, avg_loss: 0.304760
007535/008040, loss: 0.071748, avg_loss: 0.304596
007540/008040, loss: 0.045426, avg_loss: 0.304455
007545/008040, loss: 0.020044, avg_loss: 0.304301
007550/008040, loss: 0.062295, avg_loss: 0.304174
007555/008040, loss: 0.017569, avg_loss: 0.304044
007560/008040, loss: 0.180191, avg_loss: 0.303921
007565/008040, loss: 0.049493, avg_loss: 0.303774
007570/008040, loss: 0.173383, avg_loss: 0.303647
007575/008040, loss: 0.193030, avg_loss: 0.303516
007580/008040, loss: 0.131787, avg_loss: 0.303401
007585/008040, loss: 0.007346, avg_loss: 0.303243
007590/008040, loss: 0.155826, avg_loss: 0.303116
007595/008040, loss: 0.163601, avg_loss: 0.302978
007600/008040, loss: 0.035214, avg_loss: 0.302842
007605/008040, loss: 0.029514, avg_loss: 0.302685
007610/008040, loss: 0.060444, avg_loss: 0.302529
007615/008040, loss: 0.318817, avg_loss: 0.302458
007620/008040, loss: 0.080297, avg_loss: 0.302338
007625/008040, loss: 0.186482, avg_loss: 0.302204
007630/008040, loss: 0.279500, avg_loss: 0.302077
007635/008040, loss: 0.200537, avg_loss: 0.301970
007640/008040, loss: 0.194061, avg_loss: 0.301894
007645/008040, loss: 0.024023, avg_loss: 0.301730
007650/008040, loss: 0.033270, avg_loss: 0.301606
007655/008040, loss: 0.031241, avg_loss: 0.301439
007660/008040, loss: 0.113729, avg_loss: 0.301307
007665/008040, loss: 0.041298, avg_loss: 0.301153
007670/008040, loss: 0.044293, avg_loss: 0.301029
007675/008040, loss: 0.075025, avg_loss: 0.300890
007680/008040, loss: 0.125628, avg_loss: 0.300743
007685/008040, loss: 0.041533, avg_loss: 0.300654
007690/008040, loss: 0.026194, avg_loss: 0.300504
007695/008040, loss: 0.045489, avg_loss: 0.300366
007700/008040, loss: 0.075324, avg_loss: 0.300245
007705/008040, loss: 0.141094, avg_loss: 0.300132
007710/008040, loss: 0.036195, avg_loss: 0.299978
007715/008040, loss: 0.017604, avg_loss: 0.299834
007720/008040, loss: 0.081078, avg_loss: 0.299699
007725/008040, loss: 0.021621, avg_loss: 0.299564
007730/008040, loss: 0.144577, avg_loss: 0.299449
007735/008040, loss: 0.079561, avg_loss: 0.299298
007740/008040, loss: 0.040003, avg_loss: 0.299200
007745/008040, loss: 0.211169, avg_loss: 0.299085
007750/008040, loss: 0.029886, avg_loss: 0.298986
007755/008040, loss: 0.138877, avg_loss: 0.298871
007760/008040, loss: 0.043360, avg_loss: 0.298753
007765/008040, loss: 0.152495, avg_loss: 0.298619
007770/008040, loss: 0.060497, avg_loss: 0.298466
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 28, step 7772/8040: {'matthews_correlation': 0.23262243281540648}
007775/008040, loss: 0.118019, avg_loss: 0.298329
007780/008040, loss: 0.028191, avg_loss: 0.298189
007785/008040, loss: 0.135941, avg_loss: 0.298065
007790/008040, loss: 0.356724, avg_loss: 0.297975
007795/008040, loss: 0.026043, avg_loss: 0.297834
007800/008040, loss: 0.034852, avg_loss: 0.297691
007805/008040, loss: 0.154475, avg_loss: 0.297570
007810/008040, loss: 0.037241, avg_loss: 0.297409
007815/008040, loss: 0.073721, avg_loss: 0.297300
007820/008040, loss: 0.050705, avg_loss: 0.297156
007825/008040, loss: 0.105259, avg_loss: 0.297006
007830/008040, loss: 0.088514, avg_loss: 0.296893
007835/008040, loss: 0.095720, avg_loss: 0.296728
007840/008040, loss: 0.055406, avg_loss: 0.296592
007845/008040, loss: 0.045261, avg_loss: 0.296458
007850/008040, loss: 0.020414, avg_loss: 0.296311
007855/008040, loss: 0.060944, avg_loss: 0.296162
007860/008040, loss: 0.270967, avg_loss: 0.296048
007865/008040, loss: 0.287739, avg_loss: 0.295965
007870/008040, loss: 0.151463, avg_loss: 0.295820
007875/008040, loss: 0.029142, avg_loss: 0.295686
007880/008040, loss: 0.064737, avg_loss: 0.295532
007885/008040, loss: 0.077103, avg_loss: 0.295377
007890/008040, loss: 0.095900, avg_loss: 0.295271
007895/008040, loss: 0.030873, avg_loss: 0.295135
007900/008040, loss: 0.041561, avg_loss: 0.295050
007905/008040, loss: 0.052920, avg_loss: 0.294904
007910/008040, loss: 0.090048, avg_loss: 0.294785
007915/008040, loss: 0.392368, avg_loss: 0.294698
007920/008040, loss: 0.053813, avg_loss: 0.294579
007925/008040, loss: 0.122067, avg_loss: 0.294466
007930/008040, loss: 0.108241, avg_loss: 0.294332
007935/008040, loss: 0.047713, avg_loss: 0.294191
007940/008040, loss: 0.146655, avg_loss: 0.294085
007945/008040, loss: 0.041561, avg_loss: 0.293967
007950/008040, loss: 0.104168, avg_loss: 0.293888
007955/008040, loss: 0.036348, avg_loss: 0.293742
007960/008040, loss: 0.185095, avg_loss: 0.293636
007965/008040, loss: 0.250438, avg_loss: 0.293519
007970/008040, loss: 0.060160, avg_loss: 0.293385
007975/008040, loss: 0.030413, avg_loss: 0.293244
007980/008040, loss: 0.119690, avg_loss: 0.293127
007985/008040, loss: 0.157349, avg_loss: 0.293020
007990/008040, loss: 0.076855, avg_loss: 0.292911
007995/008040, loss: 0.060965, avg_loss: 0.292781
008000/008040, loss: 0.152858, avg_loss: 0.292687
008005/008040, loss: 0.257621, avg_loss: 0.292574
008010/008040, loss: 0.050748, avg_loss: 0.292464
008015/008040, loss: 0.078382, avg_loss: 0.292323
008020/008040, loss: 0.218891, avg_loss: 0.292214
008025/008040, loss: 0.052570, avg_loss: 0.292084
008030/008040, loss: 0.058064, avg_loss: 0.291971
008035/008040, loss: 0.017564, avg_loss: 0.291854
008040/008040, loss: 0.016145, avg_loss: 0.291706
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
epoch 29, step 8040/8040: {'matthews_correlation': 0.20156166898476155}
***** Running train evaluation *****
Num examples = 8551
Instantaneous batch size per device = 32
Train Dataset Result: {'matthews_correlation': 0.9609897432355321}
***** Running dev evaluation *****
Num examples = 1042
Instantaneous batch size per device = 32
Dev Dataset Result: {'matthews_correlation': 0.20156166898476155}
Training time 0:05:46