impossible-llms-dutch-random-fourgram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.3783

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
82.5965 1.0 8 10.0545
75.2966 2.0 16 9.2676
72.2807 3.0 24 8.8779
70.3242 4.0 32 8.6516
68.9057 5.0 40 8.4423
67.17 6.0 48 8.2568
65.9971 7.0 56 8.0646
63.777 8.0 64 7.8564
61.8724 9.0 72 7.6447
60.6351 10.0 80 7.4277
59.0027 11.0 88 7.2080
56.9968 12.0 96 6.9791
54.8855 13.0 104 6.7560
53.897 14.0 112 6.5470
51.5884 15.0 120 6.3640
50.4456 16.0 128 6.2130
49.583 17.0 136 6.0969
48.33 18.0 144 6.0189
48.1382 19.0 152 5.9651
47.4981 20.0 160 5.9115
47.3202 21.0 168 5.8837
47.6736 22.0 176 5.8615
47.0963 23.0 184 5.8298
47.0102 24.0 192 5.8166
46.9391 25.0 200 5.7864
46.4493 26.0 208 5.7672
46.1409 27.0 216 5.7417
46.1787 28.0 224 5.7367
46.0658 29.0 232 5.7133
45.7216 30.0 240 5.6960
45.3878 31.0 248 5.6818
45.5412 32.0 256 5.6643
45.3379 33.0 264 5.6591
45.5325 34.0 272 5.6429
45.0757 35.0 280 5.6244
44.89 36.0 288 5.6128
44.6906 37.0 296 5.6003
43.991 38.0 304 5.5926
44.3073 39.0 312 5.5720
43.679 40.0 320 5.5656
44.278 41.0 328 5.5489
43.7414 42.0 336 5.5400
44.11 43.0 344 5.5252
43.857 44.0 352 5.5118
43.3477 45.0 360 5.5025
42.9332 46.0 368 5.4867
43.5376 47.0 376 5.4795
43.0291 48.0 384 5.4620
43.0861 49.0 392 5.4565
42.469 50.0 400 5.4472
42.7038 51.0 408 5.4320
41.8678 52.0 416 5.4336
41.7871 53.0 424 5.4160
42.0693 54.0 432 5.4133
41.3849 55.0 440 5.3967
41.5663 56.0 448 5.3865
41.3677 57.0 456 5.3861
41.3664 58.0 464 5.3795
40.8909 59.0 472 5.3671
40.9037 60.0 480 5.3707
40.7368 61.0 488 5.3646
40.6582 62.0 496 5.3624
40.1752 63.0 504 5.3561
40.2725 64.0 512 5.3479
40.2386 65.0 520 5.3502
40.0417 66.0 528 5.3431
39.593 67.0 536 5.3422
39.67 68.0 544 5.3463
39.3237 69.0 552 5.3396
39.2197 70.0 560 5.3476
39.3601 71.0 568 5.3417
38.8632 72.0 576 5.3538
39.2826 73.0 584 5.3449
38.4384 74.0 592 5.3493
38.5283 75.0 600 5.3572
38.5241 76.0 608 5.3617
38.3616 77.0 616 5.3585
37.5471 78.0 624 5.3612
37.8423 79.0 632 5.3753
37.8579 80.0 640 5.3706
37.8891 81.0 648 5.3762
37.388 82.0 656 5.3865
37.2148 83.0 664 5.3949
36.9594 84.0 672 5.4041
36.6153 85.0 680 5.4150
37.1041 86.0 688 5.4241
36.4976 87.0 696 5.4311
36.5939 88.0 704 5.4333
35.9307 89.0 712 5.4425
36.2628 90.0 720 5.4511
35.9169 91.0 728 5.4590
35.7617 92.0 736 5.4706
35.4492 93.0 744 5.4728
35.6603 94.0 752 5.4854
35.2164 95.0 760 5.4956
34.7999 96.0 768 5.5013
34.9279 97.0 776 5.5209
34.7867 98.0 784 5.5353
34.9172 99.0 792 5.5482
34.1183 100.0 800 5.5567
34.1481 101.0 808 5.5641
34.202 102.0 816 5.5832
33.5556 103.0 824 5.6003
33.3879 104.0 832 5.6066
33.5313 105.0 840 5.6249
33.3095 106.0 848 5.6336
33.1677 107.0 856 5.6439
33.0919 108.0 864 5.6742
32.9743 109.0 872 5.6757
32.7652 110.0 880 5.6833
32.5602 111.0 888 5.7056
32.4285 112.0 896 5.7198
32.3074 113.0 904 5.7226
32.2366 114.0 912 5.7409
31.7914 115.0 920 5.7563
31.7031 116.0 928 5.7693
31.702 117.0 936 5.7790
31.1436 118.0 944 5.7953
31.1165 119.0 952 5.8105
31.219 120.0 960 5.8222
30.9197 121.0 968 5.8295
30.6646 122.0 976 5.8696
30.4446 123.0 984 5.8722
30.3057 124.0 992 5.8919
30.2185 125.0 1000 5.9051
30.0063 126.0 1008 5.9038
30.0369 127.0 1016 5.9358
29.7605 128.0 1024 5.9466
29.5972 129.0 1032 5.9520
29.227 130.0 1040 5.9850
29.2277 131.0 1048 5.9851
29.0853 132.0 1056 5.9847
28.8583 133.0 1064 6.0125
28.9504 134.0 1072 6.0253
28.783 135.0 1080 6.0448
28.4782 136.0 1088 6.0542
28.4885 137.0 1096 6.0663
28.4336 138.0 1104 6.0747
28.2545 139.0 1112 6.0912
27.8923 140.0 1120 6.1180
27.9312 141.0 1128 6.1177
27.6833 142.0 1136 6.1362
27.5338 143.0 1144 6.1512
27.5523 144.0 1152 6.1643
27.1259 145.0 1160 6.1756
27.0944 146.0 1168 6.1855
26.9084 147.0 1176 6.2024
26.8394 148.0 1184 6.2285
26.7645 149.0 1192 6.2392
26.7224 150.0 1200 6.2461
26.2819 151.0 1208 6.2537
26.5246 152.0 1216 6.2828
26.2742 153.0 1224 6.2859
26.2017 154.0 1232 6.2904
26.005 155.0 1240 6.2979
25.8268 156.0 1248 6.3187
25.6712 157.0 1256 6.3294
25.5722 158.0 1264 6.3498
25.4754 159.0 1272 6.3672
25.2883 160.0 1280 6.3739
25.2695 161.0 1288 6.3804
24.924 162.0 1296 6.3947
24.8009 163.0 1304 6.4014
24.7311 164.0 1312 6.4176
24.8255 165.0 1320 6.4304
24.69 166.0 1328 6.4436
24.4188 167.0 1336 6.4604
24.482 168.0 1344 6.4759
24.2478 169.0 1352 6.4743
24.2043 170.0 1360 6.5005
24.0169 171.0 1368 6.5087
24.078 172.0 1376 6.5191
23.8472 173.0 1384 6.5339
23.7919 174.0 1392 6.5440
23.7476 175.0 1400 6.5423
23.4641 176.0 1408 6.5589
23.5194 177.0 1416 6.5683
23.379 178.0 1424 6.5824
23.2253 179.0 1432 6.6006
23.1911 180.0 1440 6.6035
23.0313 181.0 1448 6.6145
22.9151 182.0 1456 6.6201
22.9167 183.0 1464 6.6373
22.8662 184.0 1472 6.6421
22.6941 185.0 1480 6.6600
22.5809 186.0 1488 6.6750
22.5431 187.0 1496 6.6682
22.4356 188.0 1504 6.6891
22.2224 189.0 1512 6.6971
22.2223 190.0 1520 6.7158
22.0909 191.0 1528 6.7108
22.0957 192.0 1536 6.7321
21.9423 193.0 1544 6.7333
21.8808 194.0 1552 6.7427
21.7397 195.0 1560 6.7640
21.752 196.0 1568 6.7554
21.6396 197.0 1576 6.7796
21.5138 198.0 1584 6.7731
21.559 199.0 1592 6.7882
21.4779 200.0 1600 6.8066
21.4427 201.0 1608 6.8110
21.2811 202.0 1616 6.8195
21.2536 203.0 1624 6.8302
21.0858 204.0 1632 6.8312
21.0491 205.0 1640 6.8492
21.0098 206.0 1648 6.8543
21.0147 207.0 1656 6.8569
20.7222 208.0 1664 6.8655
20.8482 209.0 1672 6.8782
20.6974 210.0 1680 6.8900
20.5555 211.0 1688 6.8974
20.4944 212.0 1696 6.9080
20.3623 213.0 1704 6.9134
20.4712 214.0 1712 6.9156
20.3719 215.0 1720 6.9290
20.373 216.0 1728 6.9368
20.1031 217.0 1736 6.9430
20.2204 218.0 1744 6.9420
20.0146 219.0 1752 6.9540
20.0066 220.0 1760 6.9513
20.0279 221.0 1768 6.9708
19.8976 222.0 1776 6.9693
19.8156 223.0 1784 6.9868
19.8986 224.0 1792 6.9879
19.7202 225.0 1800 6.9982
19.6539 226.0 1808 7.0048
19.6006 227.0 1816 7.0089
19.5267 228.0 1824 7.0106
19.6688 229.0 1832 7.0235
19.5103 230.0 1840 7.0332
19.3178 231.0 1848 7.0335
19.3587 232.0 1856 7.0463
19.2722 233.0 1864 7.0498
19.3014 234.0 1872 7.0578
19.2367 235.0 1880 7.0631
19.1529 236.0 1888 7.0682
19.1893 237.0 1896 7.0617
19.1128 238.0 1904 7.0778
19.0608 239.0 1912 7.0854
18.94 240.0 1920 7.0832
18.9628 241.0 1928 7.0904
18.8616 242.0 1936 7.0964
18.7878 243.0 1944 7.0989
18.8296 244.0 1952 7.1074
18.861 245.0 1960 7.1199
18.6138 246.0 1968 7.1152
18.6407 247.0 1976 7.1193
18.5996 248.0 1984 7.1372
18.6026 249.0 1992 7.1302
18.5062 250.0 2000 7.1359
18.443 251.0 2008 7.1461
18.4016 252.0 2016 7.1540
18.3853 253.0 2024 7.1487
18.3155 254.0 2032 7.1639
18.3172 255.0 2040 7.1625
18.3111 256.0 2048 7.1722
18.2977 257.0 2056 7.1764
18.2011 258.0 2064 7.1750
18.2361 259.0 2072 7.1882
18.2203 260.0 2080 7.1861
18.1275 261.0 2088 7.1917
18.0645 262.0 2096 7.1912
18.1205 263.0 2104 7.1958
18.0794 264.0 2112 7.2078
18.034 265.0 2120 7.2128
18.0087 266.0 2128 7.2125
17.8898 267.0 2136 7.2172
17.8886 268.0 2144 7.2165
17.8628 269.0 2152 7.2202
17.9153 270.0 2160 7.2235
17.8406 271.0 2168 7.2379
17.7896 272.0 2176 7.2358
17.7391 273.0 2184 7.2416
17.6869 274.0 2192 7.2415
17.7173 275.0 2200 7.2433
17.6156 276.0 2208 7.2456
17.6811 277.0 2216 7.2530
17.6686 278.0 2224 7.2499
17.6247 279.0 2232 7.2604
17.6214 280.0 2240 7.2662
17.5698 281.0 2248 7.2691
17.528 282.0 2256 7.2668
17.459 283.0 2264 7.2711
17.4908 284.0 2272 7.2713
17.4053 285.0 2280 7.2768
17.4375 286.0 2288 7.2791
17.3833 287.0 2296 7.2839
17.3051 288.0 2304 7.2916
17.3633 289.0 2312 7.2833
17.3199 290.0 2320 7.2952
17.3424 291.0 2328 7.2893
17.2745 292.0 2336 7.2950
17.303 293.0 2344 7.3068
17.2565 294.0 2352 7.3046
17.1804 295.0 2360 7.3050
17.1835 296.0 2368 7.3040
17.1743 297.0 2376 7.3089
17.109 298.0 2384 7.3128
17.1617 299.0 2392 7.3110
17.1218 300.0 2400 7.3165
17.1377 301.0 2408 7.3189
17.1457 302.0 2416 7.3176
17.0408 303.0 2424 7.3248
17.1361 304.0 2432 7.3240
17.0249 305.0 2440 7.3257
17.0421 306.0 2448 7.3244
17.0469 307.0 2456 7.3280
17.0054 308.0 2464 7.3295
16.9642 309.0 2472 7.3313
17.0032 310.0 2480 7.3337
16.9416 311.0 2488 7.3362
16.9703 312.0 2496 7.3387
16.935 313.0 2504 7.3367
16.9032 314.0 2512 7.3430
16.9189 315.0 2520 7.3450
16.8499 316.0 2528 7.3454
16.9005 317.0 2536 7.3467
16.8523 318.0 2544 7.3504
16.8969 319.0 2552 7.3498
16.786 320.0 2560 7.3512
16.8078 321.0 2568 7.3527
16.8122 322.0 2576 7.3563
16.7598 323.0 2584 7.3542
16.8536 324.0 2592 7.3557
16.7856 325.0 2600 7.3602
16.7759 326.0 2608 7.3582
16.8149 327.0 2616 7.3590
16.8103 328.0 2624 7.3592
16.8042 329.0 2632 7.3627
16.6624 330.0 2640 7.3609
16.7906 331.0 2648 7.3604
16.7814 332.0 2656 7.3666
16.6661 333.0 2664 7.3636
16.7661 334.0 2672 7.3671
16.7292 335.0 2680 7.3668
16.7116 336.0 2688 7.3687
16.7059 337.0 2696 7.3685
16.7043 338.0 2704 7.3686
16.6792 339.0 2712 7.3721
16.7029 340.0 2720 7.3717
16.6673 341.0 2728 7.3701
16.6933 342.0 2736 7.3718
16.598 343.0 2744 7.3696
16.6767 344.0 2752 7.3715
16.7212 345.0 2760 7.3714
16.6584 346.0 2768 7.3721
16.6154 347.0 2776 7.3733
16.6709 348.0 2784 7.3740
16.6304 349.0 2792 7.3736
16.7107 350.0 2800 7.3753
16.614 351.0 2808 7.3759
16.6596 352.0 2816 7.3740
16.6237 353.0 2824 7.3759
16.6717 354.0 2832 7.3756
16.6417 355.0 2840 7.3759
16.6524 356.0 2848 7.3770
16.7028 357.0 2856 7.3775
16.6476 358.0 2864 7.3764
16.6927 359.0 2872 7.3767
16.6386 360.0 2880 7.3773
16.6626 361.0 2888 7.3774
16.6548 362.0 2896 7.3781
16.645 363.0 2904 7.3774
16.635 364.0 2912 7.3769
16.6555 365.0 2920 7.3776
16.603 366.0 2928 7.3782
16.5786 367.0 2936 7.3783
16.6135 368.0 2944 7.3783
16.6465 369.0 2952 7.3784
16.5781 370.0 2960 7.3784
16.6058 371.0 2968 7.3783
16.6094 372.0 2976 7.3783
16.6183 373.0 2984 7.3783
16.5372 374.0 2992 7.3783
16.6187 375.0 3000 7.3783

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support