---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:21769
- loss:MultipleNegativesRankingLoss
base_model: Lajavaness/bilingual-embedding-large
widget:
- source_sentence: 'Go bobO..... Take this one! Slap in the face! MAP Leda Beck 5h-
A Spanish biologist, researcher: - You pay 1 million euros a month to a football
player and 1,800 euros for a Biology researcher. Now you want one treatment. Will
you ask Cristiano Ronaldo or to Messi and they will find a cure for you.'
sentences:
- Treat stroke using a needle Doctors warn against ‘dangerously misleading’ posts
claiming you can treat a stroke with a needle
- This Spanish biologist said that Cristiano Ronaldo and Messi must find a cure
for the new coronavirus as they earn much more than scientists The woman in these
posts is a Spanish politician who has made no statements about Messi, CR7, or
the cure for COVID-19.
- The Simpsons predicted 2022 Canada trucker protests Footage from The Simpsons
was edited to look like the show predicted Canada’s Covid truckers protest
- source_sentence: 'This is what the rivers of Colombia are becoming... Disappeared
people floating in the rivers, killed by Duque''s assassins This is what the rivers
of Cali are becoming In transportation of our dead at hands of duqueh''s henchmen: 242'
sentences:
- A photograph shows bodies floated in a river in Cali The photo of black bags in
a river is a tribute to those killed in the protests in Colombia
- Sri Lankan doctor created COVID-19 rapid test kits The doctor interviewed in this
report did not say he was involved in the development of COVID-19 test kits
- Masks are meant to protect the vaccinated Face mask requirements aim to protect
unvaccinated people
- source_sentence: 'How can you say it proudly that you are leaders of SA... When
it looks like a dumping site nd you living high lavishly life in your porsh houses
built out of hard earned Tax payers money? CRY SA OUR BELOVED COUNTRY '
sentences:
- Donald Trump next to a stack of declassified files Trump did not pose in this
photo with declassified files, but with federal regulations
- Images show trash-strewn streets in South Africa These photos of messy Johannesburg
streets are old and taken out of context
- SBT will again air the program "A Semana do Presidente" with Bolsonaro There is
no forecast for the return of "A Semana do Presidente" on SBT, despite a project
in 2020
- source_sentence: 'First photos of Earth sent by India''s Chadrayan2 space mission.
Breathtaking. '
sentences:
- This nest of bats is the source of the coronavirus in Wuhan China The video of
a roof infested with bats was recorded in 2011 in the United States
- Australia recalled 50 million doses of Covid vaccine No, Australia has not recalled
50 million doses of a Covid vaccine
- First photos of Earth sent by India's Chadrayan2 space mission These alleged photos
of Earth have no connection to the Chandrayaan-2 lunar mission.
- source_sentence: Even if you remove JIMENEZ ....... IF THERE IS STILL A SMARTMATIC
THAT IS GOOD AT MAGIC THERE IS ALSO NO ..... SMARTMATIC SHOULD BE REMOVED FROM
THE COMELEC CONTRACT ..... BECAUSE THE DEMON COMELEC HAS LONG HONORED THE VOTE
OF MANY PEOPLE ..... AS LONG AS THERE ARE COMMISSIONERS IN THE COMELEC WHO LOOK
LIKE MONEY, WE WILL NOT HAVE A CLEAN ELECTION ....... JUST IMAGINE HOW LONG THE
ISSUE SPREADS THAT IF A CANDIDATE WANTS TO WIN, IT WILL PAY THE COMELEC 25 MILLION
???????????????????????????? ? SO ARE THE ELECTION RESULTS HOKOS POKOS ??????????????????????
DEMONS ...... SO ALL THE PUNISHMENT OF HEAVEN HAS BEEN GIVEN IN THE PHILIPPINES
BECAUSE TANING LIVES WITH US ...... THE THOUGHT IS PURE MONEY ..... SO EVEN ELECTIONS
ARE MONEY ..... ..... 7:08 AM 4G 51% FINALLY, COMELEC OFFICIAL JIMENEZ, REMOVED
IN PLACE. BY PRRD AND OTHERS AGAIN THIS. FOR CLEAN NOW ELECTION TO COMING 2022
ELECTION
sentences:
- The WHO declared covid-19 an endemic disease Although it considers it probable,
the WHO has not yet declared covid-19 an endemic disease
- Israel, the only country with four vaccines, broke the record for covid-19 cases
Israel has not immunized its entire population with 4 doses in January 2022 and
the vaccines are effective
- Philippine President Rodrigo Duterte fired Comelec spokesman James Jimenez in
May 2021 Posts misleadingly claim Philippine president fired poll body spokesman
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on Lajavaness/bilingual-embedding-large
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Lajavaness/bilingual-embedding-large](https://huggingface.co/Lajavaness/bilingual-embedding-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Lajavaness/bilingual-embedding-large](https://huggingface.co/Lajavaness/bilingual-embedding-large)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Even if you remove JIMENEZ ....... IF THERE IS STILL A SMARTMATIC THAT IS GOOD AT MAGIC THERE IS ALSO NO ..... SMARTMATIC SHOULD BE REMOVED FROM THE COMELEC CONTRACT ..... BECAUSE THE DEMON COMELEC HAS LONG HONORED THE VOTE OF MANY PEOPLE ..... AS LONG AS THERE ARE COMMISSIONERS IN THE COMELEC WHO LOOK LIKE MONEY, WE WILL NOT HAVE A CLEAN ELECTION ....... JUST IMAGINE HOW LONG THE ISSUE SPREADS THAT IF A CANDIDATE WANTS TO WIN, IT WILL PAY THE COMELEC 25 MILLION ???????????????????????????? ? SO ARE THE ELECTION RESULTS HOKOS POKOS ?????????????????????? DEMONS ...... SO ALL THE PUNISHMENT OF HEAVEN HAS BEEN GIVEN IN THE PHILIPPINES BECAUSE TANING LIVES WITH US ...... THE THOUGHT IS PURE MONEY ..... SO EVEN ELECTIONS ARE MONEY ..... ..... 7:08 AM 4G 51% FINALLY, COMELEC OFFICIAL JIMENEZ, REMOVED IN PLACE. BY PRRD AND OTHERS AGAIN THIS. FOR CLEAN NOW ELECTION TO COMING 2022 ELECTION',
'Philippine President Rodrigo Duterte fired Comelec spokesman James Jimenez in May 2021 Posts misleadingly claim Philippine president fired poll body spokesman',
'The WHO declared covid-19 an endemic disease Although it considers it probable, the WHO has not yet declared covid-19 an endemic disease',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 21,769 training samples
* Columns: sentence_0
and sentence_1
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 |
|:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string |
| details |
"On January 1, 1979 New York billionaire Brandon Torrent allowed himself to be photographed while urinating on a homeless man sleeping on the street. This image explains, better than many words, the division of the world into social classes that we must eliminate . Meanwhile, in 21st century Brazil, many 'good citizens', just above the homeless condition, applaud politicians and politicians* who support the predatory elite represented by this abject and unworthy human being, who urinates on people who, in the final analysis, are the builders of the fortune he enjoys. Until we realize which side of this stream of urine we are on, we will not be able to build a truly just society. Class consciousness is the true and most urgent education."
| This photo shows a billionaire named Brandon Torrent urinating on a homeless man The real story behind the image of a man who appears to urinate on a homeless person
|
| French secret service officer jean claude returns from his mission as imam with deash (isis) like others from several countries in Syria.. there are questions
| This man is a French intelligence officer No, this man is not a French intelligence officer
|
| Oh yes! Rohit Sharma Mumbai Indians Burj Khalifa DIEL 82 SAMSUNG MUMBAI INDIANS
| Dubai’s Burj Khalifa skyscraper displays photo of Indian cricketer Rohit Sharma This image of the Burj Khalifa has been doctored – the original does not show a projection of Indian cricketer Rohit Sharma
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 2
- `per_device_eval_batch_size`: 2
- `num_train_epochs`: 1
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters