impossible-llms-dutch-intersect-word

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7745

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
51.7189 1.0 8 10.0476
45.5929 2.0 16 8.9352
42.1017 3.0 24 8.2878
40.0518 4.0 32 7.8903
37.6876 5.0 40 7.4690
36.3202 6.0 48 7.2505
35.5997 7.0 56 7.0793
34.8348 8.0 64 6.8924
33.7436 9.0 72 6.6834
32.6464 10.0 80 6.4698
31.5645 11.0 88 6.2486
30.3178 12.0 96 6.0195
29.3294 13.0 104 5.7948
28.2103 14.0 112 5.5902
27.1837 15.0 120 5.4050
26.4971 16.0 128 5.2435
25.6749 17.0 136 5.1103
25.2873 18.0 144 5.0085
24.6749 19.0 152 4.9364
24.3907 20.0 160 4.8965
24.1674 21.0 168 4.8433
24.2179 22.0 176 4.8306
23.8266 23.0 184 4.7872
23.6862 24.0 192 4.7720
23.7183 25.0 200 4.7411
23.62 26.0 208 4.7147
23.5007 27.0 216 4.7175
23.3527 28.0 224 4.6974
23.1154 29.0 232 4.6720
23.0423 30.0 240 4.6647
23.0967 31.0 248 4.6336
23.1378 32.0 256 4.6344
22.9028 33.0 264 4.6067
22.8743 34.0 272 4.5859
22.811 35.0 280 4.5717
22.8013 36.0 288 4.5514
22.6364 37.0 296 4.5301
22.5228 38.0 304 4.4989
22.3443 39.0 312 4.4613
22.2754 40.0 320 4.4305
22.1578 41.0 328 4.4227
22.1021 42.0 336 4.3886
22.0509 43.0 344 4.3405
21.8104 44.0 352 4.3321
21.7075 45.0 360 4.3073
21.5286 46.0 368 4.2935
21.4522 47.0 376 4.2867
21.5575 48.0 384 4.2742
21.0793 49.0 392 4.2650
21.0869 50.0 400 4.2595
21.0158 51.0 408 4.2463
21.0016 52.0 416 4.2321
20.9245 53.0 424 4.2210
20.9421 54.0 432 4.2106
20.9968 55.0 440 4.2015
20.6215 56.0 448 4.1862
20.618 57.0 456 4.1790
20.7328 58.0 464 4.1708
20.6616 59.0 472 4.1591
20.415 60.0 480 4.1473
20.4787 61.0 488 4.1395
20.3404 62.0 496 4.1273
20.1939 63.0 504 4.1141
20.1402 64.0 512 4.1028
20.1283 65.0 520 4.0897
20.1692 66.0 528 4.0822
20.1286 67.0 536 4.0682
19.8633 68.0 544 4.0593
19.8631 69.0 552 4.0496
19.8043 70.0 560 4.0391
19.708 71.0 568 4.0255
19.5752 72.0 576 4.0174
19.6992 73.0 584 4.0066
19.5357 74.0 592 4.0011
19.5627 75.0 600 3.9894
19.3649 76.0 608 3.9800
19.3032 77.0 616 3.9717
19.3458 78.0 624 3.9664
19.2749 79.0 632 3.9616
19.0506 80.0 640 3.9573
19.2856 81.0 648 3.9510
19.1438 82.0 656 3.9467
19.2168 83.0 664 3.9400
18.9629 84.0 672 3.9328
19.009 85.0 680 3.9302
18.847 86.0 688 3.9253
18.7197 87.0 696 3.9198
18.719 88.0 704 3.9144
18.5683 89.0 712 3.9140
18.7311 90.0 720 3.9106
18.6126 91.0 728 3.9047
18.6164 92.0 736 3.9005
18.5841 93.0 744 3.9014
18.365 94.0 752 3.8974
18.5044 95.0 760 3.8944
18.2188 96.0 768 3.8911
18.4713 97.0 776 3.8917
18.2075 98.0 784 3.8912
18.0931 99.0 792 3.8903
18.2395 100.0 800 3.8894
18.1068 101.0 808 3.8857
18.011 102.0 816 3.8833
18.1438 103.0 824 3.8844
17.9927 104.0 832 3.8851
18.0422 105.0 840 3.8851
18.0454 106.0 848 3.8839
17.8224 107.0 856 3.8861
17.8553 108.0 864 3.8856
17.7585 109.0 872 3.8884
17.7547 110.0 880 3.8872
17.6096 111.0 888 3.8922
17.6601 112.0 896 3.8930
17.5424 113.0 904 3.8938
17.5615 114.0 912 3.8934
17.4581 115.0 920 3.8957
17.4704 116.0 928 3.8967
17.2762 117.0 936 3.9008
17.2857 118.0 944 3.9021
17.1992 119.0 952 3.9093
17.2158 120.0 960 3.9113
17.1756 121.0 968 3.9109
17.0654 122.0 976 3.9169
17.0659 123.0 984 3.9209
17.0219 124.0 992 3.9241
16.9725 125.0 1000 3.9250
16.876 126.0 1008 3.9309
16.8892 127.0 1016 3.9345
16.7261 128.0 1024 3.9360
16.6938 129.0 1032 3.9409
16.6889 130.0 1040 3.9456
16.6275 131.0 1048 3.9522
16.6681 132.0 1056 3.9518
16.483 133.0 1064 3.9589
16.4609 134.0 1072 3.9684
16.4791 135.0 1080 3.9733
16.3662 136.0 1088 3.9750
16.2939 137.0 1096 3.9767
16.2931 138.0 1104 3.9866
16.1624 139.0 1112 3.9851
16.2026 140.0 1120 3.9919
16.1111 141.0 1128 3.9994
16.0109 142.0 1136 4.0099
16.0783 143.0 1144 4.0114
16.0069 144.0 1152 4.0167
15.9754 145.0 1160 4.0262
15.9227 146.0 1168 4.0306
15.9168 147.0 1176 4.0365
15.8825 148.0 1184 4.0413
15.7991 149.0 1192 4.0455
15.645 150.0 1200 4.0557
15.617 151.0 1208 4.0608
15.5395 152.0 1216 4.0660
15.519 153.0 1224 4.0732
15.401 154.0 1232 4.0782
15.4854 155.0 1240 4.0819
15.3471 156.0 1248 4.0914
15.318 157.0 1256 4.0970
15.3085 158.0 1264 4.1017
15.221 159.0 1272 4.1116
15.1557 160.0 1280 4.1160
15.1176 161.0 1288 4.1260
15.0693 162.0 1296 4.1339
15.0102 163.0 1304 4.1405
15.0169 164.0 1312 4.1415
14.8992 165.0 1320 4.1509
14.8533 166.0 1328 4.1603
14.8163 167.0 1336 4.1664
14.7867 168.0 1344 4.1714
14.8052 169.0 1352 4.1735
14.6813 170.0 1360 4.1812
14.7567 171.0 1368 4.1881
14.6379 172.0 1376 4.1944
14.549 173.0 1384 4.1979
14.5129 174.0 1392 4.2059
14.4243 175.0 1400 4.2126
14.4455 176.0 1408 4.2240
14.4278 177.0 1416 4.2281
14.3713 178.0 1424 4.2380
14.279 179.0 1432 4.2424
14.2533 180.0 1440 4.2459
14.246 181.0 1448 4.2515
14.1578 182.0 1456 4.2645
14.1392 183.0 1464 4.2723
14.0903 184.0 1472 4.2768
14.079 185.0 1480 4.2803
14.0375 186.0 1488 4.2843
13.9614 187.0 1496 4.2905
13.9192 188.0 1504 4.3009
13.9201 189.0 1512 4.3050
13.7963 190.0 1520 4.3104
13.8407 191.0 1528 4.3218
13.7709 192.0 1536 4.3244
13.7462 193.0 1544 4.3298
13.7029 194.0 1552 4.3375
13.6644 195.0 1560 4.3442
13.6014 196.0 1568 4.3483
13.5281 197.0 1576 4.3570
13.6119 198.0 1584 4.3631
13.5517 199.0 1592 4.3653
13.4762 200.0 1600 4.3726
13.4018 201.0 1608 4.3763
13.424 202.0 1616 4.3857
13.3859 203.0 1624 4.3876
13.3672 204.0 1632 4.3951
13.3213 205.0 1640 4.3975
13.2756 206.0 1648 4.4056
13.224 207.0 1656 4.4143
13.1885 208.0 1664 4.4194
13.1471 209.0 1672 4.4208
13.0916 210.0 1680 4.4276
13.1181 211.0 1688 4.4361
13.0552 212.0 1696 4.4399
13.0333 213.0 1704 4.4463
12.9872 214.0 1712 4.4495
12.9799 215.0 1720 4.4548
12.9542 216.0 1728 4.4606
12.9401 217.0 1736 4.4661
12.865 218.0 1744 4.4688
12.806 219.0 1752 4.4782
12.848 220.0 1760 4.4819
12.7813 221.0 1768 4.4890
12.7222 222.0 1776 4.4942
12.7728 223.0 1784 4.4961
12.7766 224.0 1792 4.5025
12.6914 225.0 1800 4.5057
12.6419 226.0 1808 4.5081
12.6464 227.0 1816 4.5133
12.6121 228.0 1824 4.5199
12.5764 229.0 1832 4.5259
12.5477 230.0 1840 4.5286
12.5402 231.0 1848 4.5358
12.4839 232.0 1856 4.5389
12.4889 233.0 1864 4.5400
12.4562 234.0 1872 4.5456
12.3885 235.0 1880 4.5538
12.3829 236.0 1888 4.5530
12.3939 237.0 1896 4.5587
12.3728 238.0 1904 4.5662
12.3166 239.0 1912 4.5676
12.2856 240.0 1920 4.5710
12.2932 241.0 1928 4.5741
12.273 242.0 1936 4.5771
12.2614 243.0 1944 4.5874
12.211 244.0 1952 4.5865
12.1802 245.0 1960 4.5901
12.1831 246.0 1968 4.5909
12.1476 247.0 1976 4.6018
12.1183 248.0 1984 4.6018
12.0687 249.0 1992 4.6060
12.1377 250.0 2000 4.6076
12.0892 251.0 2008 4.6110
12.0651 252.0 2016 4.6190
12.0587 253.0 2024 4.6192
12.0171 254.0 2032 4.6251
11.9907 255.0 2040 4.6265
12.0035 256.0 2048 4.6306
11.9462 257.0 2056 4.6352
11.9049 258.0 2064 4.6348
11.9207 259.0 2072 4.6385
11.9332 260.0 2080 4.6410
11.883 261.0 2088 4.6436
11.8819 262.0 2096 4.6473
11.882 263.0 2104 4.6480
11.8396 264.0 2112 4.6562
11.8125 265.0 2120 4.6566
11.8221 266.0 2128 4.6606
11.7901 267.0 2136 4.6594
11.7667 268.0 2144 4.6648
11.7682 269.0 2152 4.6683
11.74 270.0 2160 4.6680
11.7436 271.0 2168 4.6715
11.6919 272.0 2176 4.6753
11.7075 273.0 2184 4.6803
11.6839 274.0 2192 4.6806
11.6675 275.0 2200 4.6831
11.6685 276.0 2208 4.6869
11.6204 277.0 2216 4.6861
11.6231 278.0 2224 4.6899
11.6026 279.0 2232 4.6924
11.6077 280.0 2240 4.6920
11.6246 281.0 2248 4.6970
11.5393 282.0 2256 4.6976
11.5656 283.0 2264 4.6990
11.5618 284.0 2272 4.7038
11.5743 285.0 2280 4.7042
11.5361 286.0 2288 4.7063
11.5512 287.0 2296 4.7071
11.5432 288.0 2304 4.7129
11.4826 289.0 2312 4.7139
11.4949 290.0 2320 4.7157
11.4578 291.0 2328 4.7175
11.493 292.0 2336 4.7178
11.4711 293.0 2344 4.7172
11.4457 294.0 2352 4.7228
11.4618 295.0 2360 4.7223
11.4468 296.0 2368 4.7266
11.4437 297.0 2376 4.7263
11.4165 298.0 2384 4.7279
11.4069 299.0 2392 4.7291
11.3908 300.0 2400 4.7316
11.3938 301.0 2408 4.7334
11.3858 302.0 2416 4.7318
11.372 303.0 2424 4.7358
11.3776 304.0 2432 4.7340
11.3737 305.0 2440 4.7384
11.3785 306.0 2448 4.7394
11.3567 307.0 2456 4.7419
11.3342 308.0 2464 4.7419
11.3297 309.0 2472 4.7443
11.3224 310.0 2480 4.7444
11.3315 311.0 2488 4.7444
11.3402 312.0 2496 4.7448
11.3268 313.0 2504 4.7494
11.2771 314.0 2512 4.7497
11.308 315.0 2520 4.7496
11.2986 316.0 2528 4.7517
11.2748 317.0 2536 4.7522
11.283 318.0 2544 4.7543
11.276 319.0 2552 4.7550
11.287 320.0 2560 4.7563
11.2662 321.0 2568 4.7576
11.2744 322.0 2576 4.7562
11.2289 323.0 2584 4.7590
11.2758 324.0 2592 4.7591
11.2254 325.0 2600 4.7598
11.2405 326.0 2608 4.7602
11.2199 327.0 2616 4.7617
11.249 328.0 2624 4.7612
11.23 329.0 2632 4.7631
11.2432 330.0 2640 4.7643
11.2454 331.0 2648 4.7641
11.2006 332.0 2656 4.7656
11.2125 333.0 2664 4.7665
11.2289 334.0 2672 4.7672
11.2341 335.0 2680 4.7666
11.2303 336.0 2688 4.7682
11.191 337.0 2696 4.7678
11.2292 338.0 2704 4.7677
11.2272 339.0 2712 4.7693
11.1941 340.0 2720 4.7697
11.2065 341.0 2728 4.7692
11.2112 342.0 2736 4.7692
11.2018 343.0 2744 4.7711
11.1885 344.0 2752 4.7705
11.185 345.0 2760 4.7716
11.178 346.0 2768 4.7715
11.2254 347.0 2776 4.7717
11.1841 348.0 2784 4.7714
11.1784 349.0 2792 4.7722
11.1803 350.0 2800 4.7728
11.2022 351.0 2808 4.7728
11.179 352.0 2816 4.7728
11.1768 353.0 2824 4.7736
11.1823 354.0 2832 4.7728
11.1842 355.0 2840 4.7737
11.1867 356.0 2848 4.7738
11.1758 357.0 2856 4.7737
11.1922 358.0 2864 4.7736
11.181 359.0 2872 4.7739
11.2054 360.0 2880 4.7741
11.1617 361.0 2888 4.7742
11.1658 362.0 2896 4.7744
11.1933 363.0 2904 4.7748
11.162 364.0 2912 4.7746
11.1493 365.0 2920 4.7746
11.1738 366.0 2928 4.7746
11.174 367.0 2936 4.7745
11.1995 368.0 2944 4.7744
11.1741 369.0 2952 4.7744
11.2033 370.0 2960 4.7745
11.188 371.0 2968 4.7745
11.1504 372.0 2976 4.7745
11.1429 373.0 2984 4.7745
11.1563 374.0 2992 4.7745
11.1585 375.0 3000 4.7745

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support