Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
---
|
5 |
+
|
6 |
+
**lexdec-medium-char** is a small, autoregressive llama model featuring character-level tokenization, trained on the 2024/2025 [BabyLM dataset](https://osf.io/ryjfm/). The *checkpoints* branch contains 19 checkpoints, 10 across the first 10% of pretraining and 9 more for the remaining 9 percent of pretraining.
|
7 |
+
|
8 |
+
We used this model to trace the development of linguistic knowledge (word-level, syntax) across pretraining and to compare it to both larger character-level models and comparable subword models:
|
9 |
+
|
10 |
+
| | [small-char](https://huggingface.co/bbunzeck/lexdec-small-char) | [medium-char](https://huggingface.co/bbunzeck/lexdec-medium-char) | [large-char](https://huggingface.co/bbunzeck/lexdec-large-char) | [small-bpe](https://huggingface.co/bbunzeck/lexdec-small-bpe) | [medium-bpe](https://huggingface.co/bbunzeck/lexdec-medium-bpe) | [large-bpe](https://huggingface.co/bbunzeck/lexdec-large-bpe) |
|
11 |
+
|---|---:|---:|---:|---:|---:|---:|
|
12 |
+
| Embedding size | 128 | 256 | 512 | 128 | 256 | 512 |
|
13 |
+
| Hidden size | 128 | 256 | 512 | 128 | 256 | 512 |
|
14 |
+
| Layers | 4 | 8 | 12 | 4 | 8 | 12 |
|
15 |
+
| Attention heads | 4 | 8 | 12 | 4 | 8 | 12 |
|
16 |
+
| Context size | 128 | 128 | 128 | 128 | 128 | 128 |
|
17 |
+
| Vocab. size | 102 | 102 | 102 | 8,002 | 8,002 | 8,002 |
|
18 |
+
| Parameters | 486,016 | 3,726,592 | 21,940,736 | 2,508,416 | 7,771,392 | 30,030,336 |
|
19 |
+
|
20 |
+
If you use this model, please cite the following preprint (the final version will be added as soon as it is published):
|
21 |
+
|
22 |
+
```
|
23 |
+
@misc{bunzeck2025subwordmodelsstruggleword,
|
24 |
+
title={Subword models struggle with word learning, but surprisal hides it},
|
25 |
+
author={Bastian Bunzeck and Sina Zarrieß},
|
26 |
+
year={2025},
|
27 |
+
eprint={2502.12835},
|
28 |
+
archivePrefix={arXiv},
|
29 |
+
primaryClass={cs.CL},
|
30 |
+
url={https://arxiv.org/abs/2502.12835},}
|
31 |
+
```
|