jinaai
/

jina-reranker-m0

@@ -27,24 +27,24 @@ library_name: transformers
 [Blog](https://jina.ai/) | [API](https://jina.ai/reranker) | [AWS](#) | [Azure](#) | [Arxiv](coming soon)
-# jina-reranker-v3
 ## Intended Usage & Model Info
-The **Jina Reranker v3** (`jina-reranker-v3`) is multi-lingual, and multi-modal model that has been fine-tuned for text and visual document reranking task, which is a crucial component in many information retrieval systems. It takes a query and a document pair as input and outputs a score indicating the relevance of the document to the query. The model is trained on a large dataset of query-document pairs and is capable of reranking documents in multiple languages with high accuracy.
 # Usage
 _This model repository is licenced for research and evaluation purposes under CC-BY-NC-4.0. For commercial usage, please refer to Jina AI's APIs, AWS Sagemaker or Azure Marketplace offerings. Please [contact us](https://jina.ai/contact-sales) for any further clarifications._
-1. The easiest way to use `jina-reranker-v3` is to call Jina AI's [Reranker API](https://jina.ai/reranker/).
 ```bash
 curl https://api.jina.ai/v1/rerank \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer YOUR_API_KEY" \
   -d '{
-  "model": "jina-reranker-v3",
   "query": "Organic skincare products for sensitive skin",
   "documents": [
     {"text": "Organic skincare for sensitive skin with aloe vera and chamomile."},
@@ -76,12 +76,12 @@ And then:
 from transformers import AutoModel
 model = AutoModel.from_pretrained(
-    'jinaai/jina-reranker-v3',
     torch_dtype="auto",
     trust_remote_code=True,
 )
-model.to('cuda') # or 'cpu' if no GPU is available
 model.eval()
 # Example query and documents
@@ -102,7 +102,7 @@ documents = [
 # construct sentence pairs
 sentence_pairs = [[query, doc] for doc in documents]
-scores = model.compute_score(sentence_pairs, max_length=1024)
 ```
 The scores will be a list of floats, where each float represents the relevance score of the corresponding document to the query. Higher scores indicate higher relevance.

 [Blog](https://jina.ai/) | [API](https://jina.ai/reranker) | [AWS](#) | [Azure](#) | [Arxiv](coming soon)
+# jina-reranker-m0
 ## Intended Usage & Model Info
+The **Jina Reranker M0** (`jina-reranker-m0`) is multi-lingual, and multi-modal model that has been fine-tuned for text and visual document reranking task, which is a crucial component in many information retrieval systems. It takes a query and a document pair as input and outputs a score indicating the relevance of the document to the query. The model is trained on a large dataset of query-document pairs and is capable of reranking documents in multiple languages with high accuracy.
 # Usage
 _This model repository is licenced for research and evaluation purposes under CC-BY-NC-4.0. For commercial usage, please refer to Jina AI's APIs, AWS Sagemaker or Azure Marketplace offerings. Please [contact us](https://jina.ai/contact-sales) for any further clarifications._
+1. The easiest way to use `jina-reranker-m0` is to call Jina AI's [Reranker API](https://jina.ai/reranker/).
 ```bash
 curl https://api.jina.ai/v1/rerank \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer YOUR_API_KEY" \
   -d '{
+  "model": "jina-reranker-m0",
   "query": "Organic skincare products for sensitive skin",
   "documents": [
     {"text": "Organic skincare for sensitive skin with aloe vera and chamomile."},
 from transformers import AutoModel
 model = AutoModel.from_pretrained(
+    'jinaai/jina-reranker-m0',
     torch_dtype="auto",
     trust_remote_code=True,
 )
+model.to('cuda')  # or 'cpu' if no GPU is available
 model.eval()
 # Example query and documents
 # construct sentence pairs
 sentence_pairs = [[query, doc] for doc in documents]
+scores = model.compute_score(sentence_pairs, max_length=10240)
 ```
 The scores will be a list of floats, where each float represents the relevance score of the corresponding document to the query. Higher scores indicate higher relevance.

modeling.py CHANGED Viewed

@@ -1,6 +1,6 @@
 import torch
 from torch import nn
-from typing import Optional, Tuple, List, Union, Any
 from transformers import Qwen2VLForConditionalGeneration
 import logging
 import warnings
@@ -70,6 +70,7 @@ def formatting_prompts_func(
     return prompt
 class JinaVLForRanking(Qwen2VLForConditionalGeneration):
     def __init__(self, config):
         super().__init__(config)
@@ -129,6 +130,7 @@ class JinaVLForRanking(Qwen2VLForConditionalGeneration):
         if not hasattr(self, "_processor"):
             from transformers import AutoProcessor
             self._processor = AutoProcessor.from_pretrained(self.name_or_path, trust_remote_code=True)
         assert isinstance(pairs, list)
@@ -173,11 +175,7 @@ class JinaVLForRanking(Qwen2VLForConditionalGeneration):
                     if len(tokens['input_ids']) >= max_doc_length:
                         d = self._processor.tokenizer.decode(tokens['input_ids'])
-                batch_inputs.append(
-                    formatting_prompts_func(
-                        q, d, query_type=query_type, doc_type=doc_type
-                    )
-                )
             batch_images = None
             if doc_type == 'image':

 import torch
 from torch import nn
+from typing import Optional, Tuple, List, Union
 from transformers import Qwen2VLForConditionalGeneration
 import logging
 import warnings
     return prompt
 class JinaVLForRanking(Qwen2VLForConditionalGeneration):
     def __init__(self, config):
         super().__init__(config)
         if not hasattr(self, "_processor"):
             from transformers import AutoProcessor
             self._processor = AutoProcessor.from_pretrained(self.name_or_path, trust_remote_code=True)
         assert isinstance(pairs, list)
                     if len(tokens['input_ids']) >= max_doc_length:
                         d = self._processor.tokenizer.decode(tokens['input_ids'])
+                batch_inputs.append(formatting_prompts_func(q, d, query_type=query_type, doc_type=doc_type))
             batch_images = None
             if doc_type == 'image':