Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,51 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: PyLaia
|
3 |
+
license: mit
|
4 |
+
tags:
|
5 |
+
- PyLaia
|
6 |
+
- PyTorch
|
7 |
+
- atr
|
8 |
+
- htr
|
9 |
+
- ocr
|
10 |
+
- historical
|
11 |
+
- handwritten
|
12 |
+
metrics:
|
13 |
+
- CER
|
14 |
+
- WER
|
15 |
+
language:
|
16 |
+
- 'fr'
|
17 |
+
datasets:
|
18 |
+
- CATMuS/medieval
|
19 |
+
pipeline_tag: image-to-text
|
20 |
+
---
|
21 |
+
|
22 |
+
# PyLaia - CATMuS/medieval
|
23 |
+
|
24 |
+
This model performs Handwritten Text Recognition in Latin/Romance on historical documents.
|
25 |
+
|
26 |
+
## Model description
|
27 |
+
|
28 |
+
The model was trained using the PyLaia library on the [CATMuS/medieval](https://huggingface.co/datasets/CATMuS/medieval) dataset.
|
29 |
+
|
30 |
+
Training images were resized with a fixed height of {dimension} pixels, keeping the original aspect ratio. Vertical lines are discarded.
|
31 |
+
|
32 |
+
| set | lines |
|
33 |
+
| :----- | ------: |
|
34 |
+
| train | 15,2816 |
|
35 |
+
| val | 19,402 |
|
36 |
+
| test | 22,590 |
|
37 |
+
|
38 |
+
An external 6-gram character language model can be used to improve recognition. The language model is trained on the text from the CATMuS/medieval training set.
|
39 |
+
|
40 |
+
## Plot
|
41 |
+
|
42 |
+
The model achieves the following results:
|
43 |
+
|
44 |
+
| set | Language model | CER (%) | WER (%) | lines |
|
45 |
+
|:------|:---------------| ----------:| -------:|----------:|
|
46 |
+
| test | no | 10.54 | 28.12 | 3,819 |
|
47 |
+
| test | yes | 9.52 | 23.73 | 3,819 |
|
48 |
+
|
49 |
+
## How to use?
|
50 |
+
|
51 |
+
Please refer to the [PyLaia documentation](https://atr.pages.teklia.com/pylaia/usage/prediction/) to use this model.
|