Commit
·
a923b82
1
Parent(s):
5e2a201
review readme
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ tags:
|
|
13 |
- v1.0.6
|
14 |
---
|
15 |
|
16 |
-
# Model Card for impresso-project/ocr-quality-assessor-unigram-light
|
17 |
|
18 |
## Overview
|
19 |
|
@@ -22,15 +22,16 @@ This model is a **lightweight OCR quality assessor** for historical French and G
|
|
22 |
It uses **Bloom filters** containing known word unigrams to evaluate text quality by measuring the proportion of known vs. unknown words in OCR outputs. It is part of the [Impresso Project](https://impresso-project.ch), which develops tools for media archive processing and exploration.
|
23 |
|
24 |
## Model Details
|
|
|
25 |
|
|
|
26 |
- **Model type:** Bloom filter–based scoring via a Transformers-compatible pipeline
|
27 |
- **Languages:** French (fr), German (de)
|
28 |
- **License:** GPL-3.0
|
29 |
- **Base resource:** [`impresso-project/OCR-quality-assessment-unigram`](https://huggingface.co/impresso-project/OCR-quality-assessment-unigram)
|
30 |
-
- **Interface:**
|
31 |
- **Input format:** Raw text string
|
32 |
-
- **Output format:** Float score
|
33 |
-
- **Developed by:** UZH, Switzerland
|
34 |
|
35 |
## How to Use
|
36 |
|
|
|
13 |
- v1.0.6
|
14 |
---
|
15 |
|
16 |
+
# Model Card for `impresso-project/ocr-quality-assessor-unigram-light`
|
17 |
|
18 |
## Overview
|
19 |
|
|
|
22 |
It uses **Bloom filters** containing known word unigrams to evaluate text quality by measuring the proportion of known vs. unknown words in OCR outputs. It is part of the [Impresso Project](https://impresso-project.ch), which develops tools for media archive processing and exploration.
|
23 |
|
24 |
## Model Details
|
25 |
+
### Model Description
|
26 |
|
27 |
+
- **Developed by:** University of Zurich (UZH) from the [Impresso team](https://impresso-project.ch). The project is an interdisciplinary project focused on historical media analysis across languages, time, and modalities. Funded by the Swiss National Science Foundation ([CRSII5_173719](http://p3.snf.ch/project-173719), [CRSII5_213585](https://data.snf.ch/grants/grant/213585)) and the Luxembourg National Research Fund (grant No. 17498891).
|
28 |
- **Model type:** Bloom filter–based scoring via a Transformers-compatible pipeline
|
29 |
- **Languages:** French (fr), German (de)
|
30 |
- **License:** GPL-3.0
|
31 |
- **Base resource:** [`impresso-project/OCR-quality-assessment-unigram`](https://huggingface.co/impresso-project/OCR-quality-assessment-unigram)
|
32 |
+
- **Interface:** Hugging Face `transformers` pipeline
|
33 |
- **Input format:** Raw text string
|
34 |
+
- **Output format:** Float score representing OCR quality
|
|
|
35 |
|
36 |
## How to Use
|
37 |
|