--- license: apache-2.0 language: - en base_model: - microsoft/deberta-v3-small - HuggingFaceTB/SmolLM2-135M-Instruct pipeline_tag: token-classification tags: - NER - encoder - decoder - GLiNER - information-extraction library_name: gliner --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/V5nB1X_qdyTtyTUZHYYHk.png) **GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner. This architecture combines: * An **encoder** for representing entity spans * A **decoder** for generating label names This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities. By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed. --- ## Key Features * **Open ontology**: Works when the label set is unknown * **Multi-label entity recognition**: Assign multiple labels to a single entity * **Entity linking**: Handle large label sets via constrained generation * **Knowledge expansion**: Gain from large decoder models * **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER --- ## Installation Update to the latest version of GLiNER: ```bash # until the new pip release, install from main to use the new architecture pip install git+https://github.com/urchade/GLiNER.git ``` --- ## Usage If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below: ```python from gliner import GLiNER model = GLiNER.from_pretrained("knowledgator/gliner-decoder-small-v1.0") text = "Hugging Face is a company that advances and democratizes artificial intelligence through open source and science." labels = ["label"] model.predict_entities(text, labels, threshold=0.3, num_gen_sequences=1) ``` If you need to run a model on many text and/or set some labels constraints, please check example below: ```python from gliner import GLiNER model = GLiNER.from_pretrained("knowledgator/gliner-decoder-small-v1.0") text = ( "Apple was founded as Apple Computer Company on April 1, 1976, " "by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to " "develop and sell Wozniak's Apple I personal computer." ) labels = ["person", "company", "date"] model.run([text], labels, threshold=0.3, num_gen_sequences=1) ``` --- ### Example Output ```json [ [ { "start": 21, "end": 26, "text": "Apple", "label": "company", "score": 0.6795641779899597, "generated labels": ["Organization"] }, { "start": 47, "end": 60, "text": "April 1, 1976", "label": "date", "score": 0.44296327233314514, "generated labels": ["Date"] }, { "start": 65, "end": 78, "text": "Steve Wozniak", "label": "person", "score": 0.9934439659118652, "generated labels": ["Person"] }, { "start": 80, "end": 90, "text": "Steve Jobs", "label": "person", "score": 0.9725918769836426, "generated labels": ["Person"] }, { "start": 107, "end": 119, "text": "Ronald Wayne", "label": "person", "score": 0.9964536428451538, "generated labels": ["Person"] } ] ] ``` --- ### Restricting the Decoder You can limit the decoder to generate labels only from a predefined set: ```python model.run( text, labels, threshold=0.3, num_gen_sequences=1, gen_constraints=[ "organization", "organization type", "city", "technology", "date", "person" ] ) ``` --- ## Performance Tips Two label trie implementations are available. For a **faster, memory-efficient C++ version**, install **Cython**: ```bash pip install cython ``` This can significantly improve performance and reduce memory usage, especially with millions of labels.