impossible-llms-dutch-random-fourgram
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 7.3783
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 12
- eval_batch_size: 8
- seed: 0
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 384
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 3000
- mixed_precision_training: Native AMP
- label_smoothing_factor: 0.1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
82.5965 | 1.0 | 8 | 10.0545 |
75.2966 | 2.0 | 16 | 9.2676 |
72.2807 | 3.0 | 24 | 8.8779 |
70.3242 | 4.0 | 32 | 8.6516 |
68.9057 | 5.0 | 40 | 8.4423 |
67.17 | 6.0 | 48 | 8.2568 |
65.9971 | 7.0 | 56 | 8.0646 |
63.777 | 8.0 | 64 | 7.8564 |
61.8724 | 9.0 | 72 | 7.6447 |
60.6351 | 10.0 | 80 | 7.4277 |
59.0027 | 11.0 | 88 | 7.2080 |
56.9968 | 12.0 | 96 | 6.9791 |
54.8855 | 13.0 | 104 | 6.7560 |
53.897 | 14.0 | 112 | 6.5470 |
51.5884 | 15.0 | 120 | 6.3640 |
50.4456 | 16.0 | 128 | 6.2130 |
49.583 | 17.0 | 136 | 6.0969 |
48.33 | 18.0 | 144 | 6.0189 |
48.1382 | 19.0 | 152 | 5.9651 |
47.4981 | 20.0 | 160 | 5.9115 |
47.3202 | 21.0 | 168 | 5.8837 |
47.6736 | 22.0 | 176 | 5.8615 |
47.0963 | 23.0 | 184 | 5.8298 |
47.0102 | 24.0 | 192 | 5.8166 |
46.9391 | 25.0 | 200 | 5.7864 |
46.4493 | 26.0 | 208 | 5.7672 |
46.1409 | 27.0 | 216 | 5.7417 |
46.1787 | 28.0 | 224 | 5.7367 |
46.0658 | 29.0 | 232 | 5.7133 |
45.7216 | 30.0 | 240 | 5.6960 |
45.3878 | 31.0 | 248 | 5.6818 |
45.5412 | 32.0 | 256 | 5.6643 |
45.3379 | 33.0 | 264 | 5.6591 |
45.5325 | 34.0 | 272 | 5.6429 |
45.0757 | 35.0 | 280 | 5.6244 |
44.89 | 36.0 | 288 | 5.6128 |
44.6906 | 37.0 | 296 | 5.6003 |
43.991 | 38.0 | 304 | 5.5926 |
44.3073 | 39.0 | 312 | 5.5720 |
43.679 | 40.0 | 320 | 5.5656 |
44.278 | 41.0 | 328 | 5.5489 |
43.7414 | 42.0 | 336 | 5.5400 |
44.11 | 43.0 | 344 | 5.5252 |
43.857 | 44.0 | 352 | 5.5118 |
43.3477 | 45.0 | 360 | 5.5025 |
42.9332 | 46.0 | 368 | 5.4867 |
43.5376 | 47.0 | 376 | 5.4795 |
43.0291 | 48.0 | 384 | 5.4620 |
43.0861 | 49.0 | 392 | 5.4565 |
42.469 | 50.0 | 400 | 5.4472 |
42.7038 | 51.0 | 408 | 5.4320 |
41.8678 | 52.0 | 416 | 5.4336 |
41.7871 | 53.0 | 424 | 5.4160 |
42.0693 | 54.0 | 432 | 5.4133 |
41.3849 | 55.0 | 440 | 5.3967 |
41.5663 | 56.0 | 448 | 5.3865 |
41.3677 | 57.0 | 456 | 5.3861 |
41.3664 | 58.0 | 464 | 5.3795 |
40.8909 | 59.0 | 472 | 5.3671 |
40.9037 | 60.0 | 480 | 5.3707 |
40.7368 | 61.0 | 488 | 5.3646 |
40.6582 | 62.0 | 496 | 5.3624 |
40.1752 | 63.0 | 504 | 5.3561 |
40.2725 | 64.0 | 512 | 5.3479 |
40.2386 | 65.0 | 520 | 5.3502 |
40.0417 | 66.0 | 528 | 5.3431 |
39.593 | 67.0 | 536 | 5.3422 |
39.67 | 68.0 | 544 | 5.3463 |
39.3237 | 69.0 | 552 | 5.3396 |
39.2197 | 70.0 | 560 | 5.3476 |
39.3601 | 71.0 | 568 | 5.3417 |
38.8632 | 72.0 | 576 | 5.3538 |
39.2826 | 73.0 | 584 | 5.3449 |
38.4384 | 74.0 | 592 | 5.3493 |
38.5283 | 75.0 | 600 | 5.3572 |
38.5241 | 76.0 | 608 | 5.3617 |
38.3616 | 77.0 | 616 | 5.3585 |
37.5471 | 78.0 | 624 | 5.3612 |
37.8423 | 79.0 | 632 | 5.3753 |
37.8579 | 80.0 | 640 | 5.3706 |
37.8891 | 81.0 | 648 | 5.3762 |
37.388 | 82.0 | 656 | 5.3865 |
37.2148 | 83.0 | 664 | 5.3949 |
36.9594 | 84.0 | 672 | 5.4041 |
36.6153 | 85.0 | 680 | 5.4150 |
37.1041 | 86.0 | 688 | 5.4241 |
36.4976 | 87.0 | 696 | 5.4311 |
36.5939 | 88.0 | 704 | 5.4333 |
35.9307 | 89.0 | 712 | 5.4425 |
36.2628 | 90.0 | 720 | 5.4511 |
35.9169 | 91.0 | 728 | 5.4590 |
35.7617 | 92.0 | 736 | 5.4706 |
35.4492 | 93.0 | 744 | 5.4728 |
35.6603 | 94.0 | 752 | 5.4854 |
35.2164 | 95.0 | 760 | 5.4956 |
34.7999 | 96.0 | 768 | 5.5013 |
34.9279 | 97.0 | 776 | 5.5209 |
34.7867 | 98.0 | 784 | 5.5353 |
34.9172 | 99.0 | 792 | 5.5482 |
34.1183 | 100.0 | 800 | 5.5567 |
34.1481 | 101.0 | 808 | 5.5641 |
34.202 | 102.0 | 816 | 5.5832 |
33.5556 | 103.0 | 824 | 5.6003 |
33.3879 | 104.0 | 832 | 5.6066 |
33.5313 | 105.0 | 840 | 5.6249 |
33.3095 | 106.0 | 848 | 5.6336 |
33.1677 | 107.0 | 856 | 5.6439 |
33.0919 | 108.0 | 864 | 5.6742 |
32.9743 | 109.0 | 872 | 5.6757 |
32.7652 | 110.0 | 880 | 5.6833 |
32.5602 | 111.0 | 888 | 5.7056 |
32.4285 | 112.0 | 896 | 5.7198 |
32.3074 | 113.0 | 904 | 5.7226 |
32.2366 | 114.0 | 912 | 5.7409 |
31.7914 | 115.0 | 920 | 5.7563 |
31.7031 | 116.0 | 928 | 5.7693 |
31.702 | 117.0 | 936 | 5.7790 |
31.1436 | 118.0 | 944 | 5.7953 |
31.1165 | 119.0 | 952 | 5.8105 |
31.219 | 120.0 | 960 | 5.8222 |
30.9197 | 121.0 | 968 | 5.8295 |
30.6646 | 122.0 | 976 | 5.8696 |
30.4446 | 123.0 | 984 | 5.8722 |
30.3057 | 124.0 | 992 | 5.8919 |
30.2185 | 125.0 | 1000 | 5.9051 |
30.0063 | 126.0 | 1008 | 5.9038 |
30.0369 | 127.0 | 1016 | 5.9358 |
29.7605 | 128.0 | 1024 | 5.9466 |
29.5972 | 129.0 | 1032 | 5.9520 |
29.227 | 130.0 | 1040 | 5.9850 |
29.2277 | 131.0 | 1048 | 5.9851 |
29.0853 | 132.0 | 1056 | 5.9847 |
28.8583 | 133.0 | 1064 | 6.0125 |
28.9504 | 134.0 | 1072 | 6.0253 |
28.783 | 135.0 | 1080 | 6.0448 |
28.4782 | 136.0 | 1088 | 6.0542 |
28.4885 | 137.0 | 1096 | 6.0663 |
28.4336 | 138.0 | 1104 | 6.0747 |
28.2545 | 139.0 | 1112 | 6.0912 |
27.8923 | 140.0 | 1120 | 6.1180 |
27.9312 | 141.0 | 1128 | 6.1177 |
27.6833 | 142.0 | 1136 | 6.1362 |
27.5338 | 143.0 | 1144 | 6.1512 |
27.5523 | 144.0 | 1152 | 6.1643 |
27.1259 | 145.0 | 1160 | 6.1756 |
27.0944 | 146.0 | 1168 | 6.1855 |
26.9084 | 147.0 | 1176 | 6.2024 |
26.8394 | 148.0 | 1184 | 6.2285 |
26.7645 | 149.0 | 1192 | 6.2392 |
26.7224 | 150.0 | 1200 | 6.2461 |
26.2819 | 151.0 | 1208 | 6.2537 |
26.5246 | 152.0 | 1216 | 6.2828 |
26.2742 | 153.0 | 1224 | 6.2859 |
26.2017 | 154.0 | 1232 | 6.2904 |
26.005 | 155.0 | 1240 | 6.2979 |
25.8268 | 156.0 | 1248 | 6.3187 |
25.6712 | 157.0 | 1256 | 6.3294 |
25.5722 | 158.0 | 1264 | 6.3498 |
25.4754 | 159.0 | 1272 | 6.3672 |
25.2883 | 160.0 | 1280 | 6.3739 |
25.2695 | 161.0 | 1288 | 6.3804 |
24.924 | 162.0 | 1296 | 6.3947 |
24.8009 | 163.0 | 1304 | 6.4014 |
24.7311 | 164.0 | 1312 | 6.4176 |
24.8255 | 165.0 | 1320 | 6.4304 |
24.69 | 166.0 | 1328 | 6.4436 |
24.4188 | 167.0 | 1336 | 6.4604 |
24.482 | 168.0 | 1344 | 6.4759 |
24.2478 | 169.0 | 1352 | 6.4743 |
24.2043 | 170.0 | 1360 | 6.5005 |
24.0169 | 171.0 | 1368 | 6.5087 |
24.078 | 172.0 | 1376 | 6.5191 |
23.8472 | 173.0 | 1384 | 6.5339 |
23.7919 | 174.0 | 1392 | 6.5440 |
23.7476 | 175.0 | 1400 | 6.5423 |
23.4641 | 176.0 | 1408 | 6.5589 |
23.5194 | 177.0 | 1416 | 6.5683 |
23.379 | 178.0 | 1424 | 6.5824 |
23.2253 | 179.0 | 1432 | 6.6006 |
23.1911 | 180.0 | 1440 | 6.6035 |
23.0313 | 181.0 | 1448 | 6.6145 |
22.9151 | 182.0 | 1456 | 6.6201 |
22.9167 | 183.0 | 1464 | 6.6373 |
22.8662 | 184.0 | 1472 | 6.6421 |
22.6941 | 185.0 | 1480 | 6.6600 |
22.5809 | 186.0 | 1488 | 6.6750 |
22.5431 | 187.0 | 1496 | 6.6682 |
22.4356 | 188.0 | 1504 | 6.6891 |
22.2224 | 189.0 | 1512 | 6.6971 |
22.2223 | 190.0 | 1520 | 6.7158 |
22.0909 | 191.0 | 1528 | 6.7108 |
22.0957 | 192.0 | 1536 | 6.7321 |
21.9423 | 193.0 | 1544 | 6.7333 |
21.8808 | 194.0 | 1552 | 6.7427 |
21.7397 | 195.0 | 1560 | 6.7640 |
21.752 | 196.0 | 1568 | 6.7554 |
21.6396 | 197.0 | 1576 | 6.7796 |
21.5138 | 198.0 | 1584 | 6.7731 |
21.559 | 199.0 | 1592 | 6.7882 |
21.4779 | 200.0 | 1600 | 6.8066 |
21.4427 | 201.0 | 1608 | 6.8110 |
21.2811 | 202.0 | 1616 | 6.8195 |
21.2536 | 203.0 | 1624 | 6.8302 |
21.0858 | 204.0 | 1632 | 6.8312 |
21.0491 | 205.0 | 1640 | 6.8492 |
21.0098 | 206.0 | 1648 | 6.8543 |
21.0147 | 207.0 | 1656 | 6.8569 |
20.7222 | 208.0 | 1664 | 6.8655 |
20.8482 | 209.0 | 1672 | 6.8782 |
20.6974 | 210.0 | 1680 | 6.8900 |
20.5555 | 211.0 | 1688 | 6.8974 |
20.4944 | 212.0 | 1696 | 6.9080 |
20.3623 | 213.0 | 1704 | 6.9134 |
20.4712 | 214.0 | 1712 | 6.9156 |
20.3719 | 215.0 | 1720 | 6.9290 |
20.373 | 216.0 | 1728 | 6.9368 |
20.1031 | 217.0 | 1736 | 6.9430 |
20.2204 | 218.0 | 1744 | 6.9420 |
20.0146 | 219.0 | 1752 | 6.9540 |
20.0066 | 220.0 | 1760 | 6.9513 |
20.0279 | 221.0 | 1768 | 6.9708 |
19.8976 | 222.0 | 1776 | 6.9693 |
19.8156 | 223.0 | 1784 | 6.9868 |
19.8986 | 224.0 | 1792 | 6.9879 |
19.7202 | 225.0 | 1800 | 6.9982 |
19.6539 | 226.0 | 1808 | 7.0048 |
19.6006 | 227.0 | 1816 | 7.0089 |
19.5267 | 228.0 | 1824 | 7.0106 |
19.6688 | 229.0 | 1832 | 7.0235 |
19.5103 | 230.0 | 1840 | 7.0332 |
19.3178 | 231.0 | 1848 | 7.0335 |
19.3587 | 232.0 | 1856 | 7.0463 |
19.2722 | 233.0 | 1864 | 7.0498 |
19.3014 | 234.0 | 1872 | 7.0578 |
19.2367 | 235.0 | 1880 | 7.0631 |
19.1529 | 236.0 | 1888 | 7.0682 |
19.1893 | 237.0 | 1896 | 7.0617 |
19.1128 | 238.0 | 1904 | 7.0778 |
19.0608 | 239.0 | 1912 | 7.0854 |
18.94 | 240.0 | 1920 | 7.0832 |
18.9628 | 241.0 | 1928 | 7.0904 |
18.8616 | 242.0 | 1936 | 7.0964 |
18.7878 | 243.0 | 1944 | 7.0989 |
18.8296 | 244.0 | 1952 | 7.1074 |
18.861 | 245.0 | 1960 | 7.1199 |
18.6138 | 246.0 | 1968 | 7.1152 |
18.6407 | 247.0 | 1976 | 7.1193 |
18.5996 | 248.0 | 1984 | 7.1372 |
18.6026 | 249.0 | 1992 | 7.1302 |
18.5062 | 250.0 | 2000 | 7.1359 |
18.443 | 251.0 | 2008 | 7.1461 |
18.4016 | 252.0 | 2016 | 7.1540 |
18.3853 | 253.0 | 2024 | 7.1487 |
18.3155 | 254.0 | 2032 | 7.1639 |
18.3172 | 255.0 | 2040 | 7.1625 |
18.3111 | 256.0 | 2048 | 7.1722 |
18.2977 | 257.0 | 2056 | 7.1764 |
18.2011 | 258.0 | 2064 | 7.1750 |
18.2361 | 259.0 | 2072 | 7.1882 |
18.2203 | 260.0 | 2080 | 7.1861 |
18.1275 | 261.0 | 2088 | 7.1917 |
18.0645 | 262.0 | 2096 | 7.1912 |
18.1205 | 263.0 | 2104 | 7.1958 |
18.0794 | 264.0 | 2112 | 7.2078 |
18.034 | 265.0 | 2120 | 7.2128 |
18.0087 | 266.0 | 2128 | 7.2125 |
17.8898 | 267.0 | 2136 | 7.2172 |
17.8886 | 268.0 | 2144 | 7.2165 |
17.8628 | 269.0 | 2152 | 7.2202 |
17.9153 | 270.0 | 2160 | 7.2235 |
17.8406 | 271.0 | 2168 | 7.2379 |
17.7896 | 272.0 | 2176 | 7.2358 |
17.7391 | 273.0 | 2184 | 7.2416 |
17.6869 | 274.0 | 2192 | 7.2415 |
17.7173 | 275.0 | 2200 | 7.2433 |
17.6156 | 276.0 | 2208 | 7.2456 |
17.6811 | 277.0 | 2216 | 7.2530 |
17.6686 | 278.0 | 2224 | 7.2499 |
17.6247 | 279.0 | 2232 | 7.2604 |
17.6214 | 280.0 | 2240 | 7.2662 |
17.5698 | 281.0 | 2248 | 7.2691 |
17.528 | 282.0 | 2256 | 7.2668 |
17.459 | 283.0 | 2264 | 7.2711 |
17.4908 | 284.0 | 2272 | 7.2713 |
17.4053 | 285.0 | 2280 | 7.2768 |
17.4375 | 286.0 | 2288 | 7.2791 |
17.3833 | 287.0 | 2296 | 7.2839 |
17.3051 | 288.0 | 2304 | 7.2916 |
17.3633 | 289.0 | 2312 | 7.2833 |
17.3199 | 290.0 | 2320 | 7.2952 |
17.3424 | 291.0 | 2328 | 7.2893 |
17.2745 | 292.0 | 2336 | 7.2950 |
17.303 | 293.0 | 2344 | 7.3068 |
17.2565 | 294.0 | 2352 | 7.3046 |
17.1804 | 295.0 | 2360 | 7.3050 |
17.1835 | 296.0 | 2368 | 7.3040 |
17.1743 | 297.0 | 2376 | 7.3089 |
17.109 | 298.0 | 2384 | 7.3128 |
17.1617 | 299.0 | 2392 | 7.3110 |
17.1218 | 300.0 | 2400 | 7.3165 |
17.1377 | 301.0 | 2408 | 7.3189 |
17.1457 | 302.0 | 2416 | 7.3176 |
17.0408 | 303.0 | 2424 | 7.3248 |
17.1361 | 304.0 | 2432 | 7.3240 |
17.0249 | 305.0 | 2440 | 7.3257 |
17.0421 | 306.0 | 2448 | 7.3244 |
17.0469 | 307.0 | 2456 | 7.3280 |
17.0054 | 308.0 | 2464 | 7.3295 |
16.9642 | 309.0 | 2472 | 7.3313 |
17.0032 | 310.0 | 2480 | 7.3337 |
16.9416 | 311.0 | 2488 | 7.3362 |
16.9703 | 312.0 | 2496 | 7.3387 |
16.935 | 313.0 | 2504 | 7.3367 |
16.9032 | 314.0 | 2512 | 7.3430 |
16.9189 | 315.0 | 2520 | 7.3450 |
16.8499 | 316.0 | 2528 | 7.3454 |
16.9005 | 317.0 | 2536 | 7.3467 |
16.8523 | 318.0 | 2544 | 7.3504 |
16.8969 | 319.0 | 2552 | 7.3498 |
16.786 | 320.0 | 2560 | 7.3512 |
16.8078 | 321.0 | 2568 | 7.3527 |
16.8122 | 322.0 | 2576 | 7.3563 |
16.7598 | 323.0 | 2584 | 7.3542 |
16.8536 | 324.0 | 2592 | 7.3557 |
16.7856 | 325.0 | 2600 | 7.3602 |
16.7759 | 326.0 | 2608 | 7.3582 |
16.8149 | 327.0 | 2616 | 7.3590 |
16.8103 | 328.0 | 2624 | 7.3592 |
16.8042 | 329.0 | 2632 | 7.3627 |
16.6624 | 330.0 | 2640 | 7.3609 |
16.7906 | 331.0 | 2648 | 7.3604 |
16.7814 | 332.0 | 2656 | 7.3666 |
16.6661 | 333.0 | 2664 | 7.3636 |
16.7661 | 334.0 | 2672 | 7.3671 |
16.7292 | 335.0 | 2680 | 7.3668 |
16.7116 | 336.0 | 2688 | 7.3687 |
16.7059 | 337.0 | 2696 | 7.3685 |
16.7043 | 338.0 | 2704 | 7.3686 |
16.6792 | 339.0 | 2712 | 7.3721 |
16.7029 | 340.0 | 2720 | 7.3717 |
16.6673 | 341.0 | 2728 | 7.3701 |
16.6933 | 342.0 | 2736 | 7.3718 |
16.598 | 343.0 | 2744 | 7.3696 |
16.6767 | 344.0 | 2752 | 7.3715 |
16.7212 | 345.0 | 2760 | 7.3714 |
16.6584 | 346.0 | 2768 | 7.3721 |
16.6154 | 347.0 | 2776 | 7.3733 |
16.6709 | 348.0 | 2784 | 7.3740 |
16.6304 | 349.0 | 2792 | 7.3736 |
16.7107 | 350.0 | 2800 | 7.3753 |
16.614 | 351.0 | 2808 | 7.3759 |
16.6596 | 352.0 | 2816 | 7.3740 |
16.6237 | 353.0 | 2824 | 7.3759 |
16.6717 | 354.0 | 2832 | 7.3756 |
16.6417 | 355.0 | 2840 | 7.3759 |
16.6524 | 356.0 | 2848 | 7.3770 |
16.7028 | 357.0 | 2856 | 7.3775 |
16.6476 | 358.0 | 2864 | 7.3764 |
16.6927 | 359.0 | 2872 | 7.3767 |
16.6386 | 360.0 | 2880 | 7.3773 |
16.6626 | 361.0 | 2888 | 7.3774 |
16.6548 | 362.0 | 2896 | 7.3781 |
16.645 | 363.0 | 2904 | 7.3774 |
16.635 | 364.0 | 2912 | 7.3769 |
16.6555 | 365.0 | 2920 | 7.3776 |
16.603 | 366.0 | 2928 | 7.3782 |
16.5786 | 367.0 | 2936 | 7.3783 |
16.6135 | 368.0 | 2944 | 7.3783 |
16.6465 | 369.0 | 2952 | 7.3784 |
16.5781 | 370.0 | 2960 | 7.3784 |
16.6058 | 371.0 | 2968 | 7.3783 |
16.6094 | 372.0 | 2976 | 7.3783 |
16.6183 | 373.0 | 2984 | 7.3783 |
16.5372 | 374.0 | 2992 | 7.3783 |
16.6187 | 375.0 | 3000 | 7.3783 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.4.0+cu121
- Datasets 3.4.0
- Tokenizers 0.21.0
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support