metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:114699
- loss:CachedGISTEmbedLoss
base_model: BAAI/bge-large-en-v1.5
widget:
- source_sentence: >-
For roles such as 'physiotherapist', 'neuromusculoskeletal
physiotherapist', 'osteopath', and 'chiropractor', the skills needed
include a deep understanding of human anatomy and physiology, strong
diagnostic skills, and the ability to apply manual therapy techniques to
treat musculoskeletal issues. Additionally, effective communication skills
are crucial for explaining treatments and exercises to patients, while
adaptability and problem-solving skills are essential for tailoring
treatments to individual patient needs.
sentences:
- >-
Job roles such as insulation installers, HVAC technicians, and
construction engineers require knowledge of various types and
characteristics of insulation materials to effectively reduce heat
transfer and improve energy efficiency in buildings and systems.
Understanding the typology of insulation materials, including their
thermal properties, durability, and environmental impact, is crucial for
these professionals to select the most appropriate materials for
specific applications.
- >-
Job roles such as Contract Managers, Legal Analysts, and Compliance
Officers require the skill of reviewing or auditing completed contracts
to ensure legal accuracy, compliance with regulations, and alignment
with organizational goals.
- >-
Job roles that require skills in dealing with emergency care situations
include emergency medical technicians (EMTs), paramedics, and emergency
room nurses or doctors, all of whom must quickly and effectively manage
critical health situations to save lives.
- source_sentence: >-
Bus drivers, including those operating in various sectors like public
transit, intercity, private, or school services, need strong driving
skills, knowledge of traffic laws, and the ability to operate safely in
diverse conditions. Additionally, effective communication skills and the
ability to handle passenger inquiries and emergencies are crucial.
sentences:
- >-
Job roles that require the skill to calibrate electronic instruments
include calibration technicians, quality control engineers, and
instrumentation specialists. These professionals ensure the accuracy and
reliability of various electronic devices and systems across different
industries such as manufacturing, aerospace, and automotive.
- >-
Job roles such as Building Engineer, Architect, and Construction
Specialist require skills in designing, engineering, or developing
air-tight building structures to ensure energy efficiency and
environmental control within the building.
- >-
Job roles such as customer service representatives, flight attendants,
and hotel concierges require a strong focus on passengers or customers,
ensuring their needs and comfort are prioritized to provide excellent
service and support.
- source_sentence: >-
A mine surveyor, also known as a mining surveyor or mine planning
surveyor, requires expertise in geomatics and mining engineering to
accurately map and plan mine operations, ensuring safety and efficiency.
They must also possess strong analytical skills and the ability to use
specialized software for creating detailed mine plans and maintaining
accurate records.
sentences:
- >-
Job roles such as data analysts, business analysts, and financial
analysts require the skill to present reports or prepare statistical
reports, as they often need to communicate complex data insights clearly
and effectively to stakeholders.
- >-
Job roles that require monitoring flour unloading equipment include
Quality Control Technicians, Process Operators, and Mill Supervisors,
who ensure the efficient and safe operation of flour processing systems
and the proper unloading of flour from transport vehicles.
- >-
Job roles that require skills in the manufacturing of made-up textile
articles include textile production managers, machinery operators, and
quality control inspectors, all of whom utilize specific technology and
machinery to produce finished textile products such as clothing, home
textiles, and industrial fabrics.
- source_sentence: >-
An insulation supervisor, regardless of the specific type of insulation
material or installation area, requires strong project management skills,
knowledge of building codes and safety regulations, and expertise in
insulation techniques to oversee the installation process effectively and
ensure quality standards are met.
sentences:
- >-
Job roles that require skills in energy efficiency, such as promoting
energy efficiency or efficient energy use, include Energy Managers,
Sustainability Specialists, and Building Engineers, who focus on
reducing energy consumption and improving energy use in various
settings. Additionally, roles like Battery Technicians or Engineers
involve battery benchmarking to enhance energy storage and efficiency in
technological devices and systems.
- >-
The skill of applying or installing waterproofing and damp-proofing
membranes is primarily required by construction workers such as
waterproofing specialists, roofers, and building envelope technicians,
who use these membranes to prevent water damage in buildings and
structures.
- >-
Job roles such as laboratory technicians, chemists, and materials
scientists require skills in laboratory techniques, including electronic
and thermic methods, gas chromatography, and gravimetric analysis, to
conduct precise experiments and analyze materials. These professionals
must apply natural science techniques and use various lab techniques to
ensure accurate and reliable results in their research or quality
control processes.
- source_sentence: >-
For roles such as import/export manager, graduate export manager, senior
export manager, and other related positions in meat and meat products, the
key skills include a strong understanding of international trade
regulations, meat product knowledge, customs compliance, and excellent
negotiation and communication skills to manage global supply chains
effectively. Additionally, proficiency in relevant trade software and
languages can be highly beneficial.
sentences:
- >-
Job roles that require skills such as managing staff, coordinating
employees, and performing HR activities include Human Resources
Managers, Team Leaders, Supervisors, and Department Heads, all of whom
are responsible for overseeing personnel, implementing HR policies, and
ensuring efficient team operations.
- >-
Job roles such as Control Systems Engineer, Automation Engineer, and
Systems Designer require skills in designing, planning, and developing
control systems to manage and optimize the performance of various
technological processes and machinery. These professionals are tasked
with creating efficient and reliable systems that can operate
autonomously or with minimal human intervention.
- >-
Job roles such as Performance Analyst, Quality Assurance Engineer, and
Test Manager require skills in conducting performance measurement and
organizing or managing conversion testing to ensure software and systems
meet performance standards and function correctly in real-world
scenarios.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@20
- cosine_accuracy@50
- cosine_accuracy@100
- cosine_accuracy@150
- cosine_accuracy@200
- cosine_precision@1
- cosine_precision@20
- cosine_precision@50
- cosine_precision@100
- cosine_precision@150
- cosine_precision@200
- cosine_recall@1
- cosine_recall@20
- cosine_recall@50
- cosine_recall@100
- cosine_recall@150
- cosine_recall@200
- cosine_ndcg@1
- cosine_ndcg@20
- cosine_ndcg@50
- cosine_ndcg@100
- cosine_ndcg@150
- cosine_ndcg@200
- cosine_mrr@1
- cosine_mrr@20
- cosine_mrr@50
- cosine_mrr@100
- cosine_mrr@150
- cosine_mrr@200
- cosine_map@1
- cosine_map@20
- cosine_map@50
- cosine_map@100
- cosine_map@150
- cosine_map@200
- cosine_map@500
model-index:
- name: SentenceTransformer based on BAAI/bge-large-en-v1.5
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: full en
type: full_en
metrics:
- type: cosine_accuracy@1
value: 0.7302631578947368
name: Cosine Accuracy@1
- type: cosine_accuracy@20
value: 0.993421052631579
name: Cosine Accuracy@20
- type: cosine_accuracy@50
value: 0.9967105263157895
name: Cosine Accuracy@50
- type: cosine_accuracy@100
value: 1
name: Cosine Accuracy@100
- type: cosine_accuracy@150
value: 1
name: Cosine Accuracy@150
- type: cosine_accuracy@200
value: 1
name: Cosine Accuracy@200
- type: cosine_precision@1
value: 0.7302631578947368
name: Cosine Precision@1
- type: cosine_precision@20
value: 0.4998355263157894
name: Cosine Precision@20
- type: cosine_precision@50
value: 0.39184210526315794
name: Cosine Precision@50
- type: cosine_precision@100
value: 0.3111842105263158
name: Cosine Precision@100
- type: cosine_precision@150
value: 0.2652412280701754
name: Cosine Precision@150
- type: cosine_precision@200
value: 0.232171052631579
name: Cosine Precision@200
- type: cosine_recall@1
value: 0.010227350724729817
name: Cosine Recall@1
- type: cosine_recall@20
value: 0.13368254620254577
name: Cosine Recall@20
- type: cosine_recall@50
value: 0.2541249933594102
name: Cosine Recall@50
- type: cosine_recall@100
value: 0.3948435268881245
name: Cosine Recall@100
- type: cosine_recall@150
value: 0.49626849018850344
name: Cosine Recall@150
- type: cosine_recall@200
value: 0.5720837677245543
name: Cosine Recall@200
- type: cosine_ndcg@1
value: 0.7302631578947368
name: Cosine Ndcg@1
- type: cosine_ndcg@20
value: 0.5384654647855256
name: Cosine Ndcg@20
- type: cosine_ndcg@50
value: 0.44986527953229877
name: Cosine Ndcg@50
- type: cosine_ndcg@100
value: 0.44277699637488865
name: Cosine Ndcg@100
- type: cosine_ndcg@150
value: 0.4895063673734854
name: Cosine Ndcg@150
- type: cosine_ndcg@200
value: 0.5346148440105628
name: Cosine Ndcg@200
- type: cosine_mrr@1
value: 0.7302631578947368
name: Cosine Mrr@1
- type: cosine_mrr@20
value: 0.8341772399749373
name: Cosine Mrr@20
- type: cosine_mrr@50
value: 0.8343338815789473
name: Cosine Mrr@50
- type: cosine_mrr@100
value: 0.8343905966424682
name: Cosine Mrr@100
- type: cosine_mrr@150
value: 0.8343905966424682
name: Cosine Mrr@150
- type: cosine_mrr@200
value: 0.8343905966424682
name: Cosine Mrr@200
- type: cosine_map@1
value: 0.7302631578947368
name: Cosine Map@1
- type: cosine_map@20
value: 0.3434603918412553
name: Cosine Map@20
- type: cosine_map@50
value: 0.23779270403918282
name: Cosine Map@50
- type: cosine_map@100
value: 0.21161540263537876
name: Cosine Map@100
- type: cosine_map@150
value: 0.22899252179487295
name: Cosine Map@150
- type: cosine_map@200
value: 0.24784282323083537
name: Cosine Map@200
- type: cosine_map@500
value: 0.298154972004029
name: Cosine Map@500
Job-Skill matching fintuned BAAI/bge-large-en-v1.5
Top performing model on TalentCLEF 2025 Task B. Use it for job title <-> skill set matching
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-large-en-v1.5
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("pj-mathematician/JobSkillBGE-large-en-v1.5")
# Run inference
sentences = [
'For roles such as import/export manager, graduate export manager, senior export manager, and other related positions in meat and meat products, the key skills include a strong understanding of international trade regulations, meat product knowledge, customs compliance, and excellent negotiation and communication skills to manage global supply chains effectively. Additionally, proficiency in relevant trade software and languages can be highly beneficial.',
'Job roles such as Performance Analyst, Quality Assurance Engineer, and Test Manager require skills in conducting performance measurement and organizing or managing conversion testing to ensure software and systems meet performance standards and function correctly in real-world scenarios.',
'Job roles that require skills such as managing staff, coordinating employees, and performing HR activities include Human Resources Managers, Team Leaders, Supervisors, and Department Heads, all of whom are responsible for overseeing personnel, implementing HR policies, and ensuring efficient team operations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
full_en
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.7303 |
cosine_accuracy@20 | 0.9934 |
cosine_accuracy@50 | 0.9967 |
cosine_accuracy@100 | 1.0 |
cosine_accuracy@150 | 1.0 |
cosine_accuracy@200 | 1.0 |
cosine_precision@1 | 0.7303 |
cosine_precision@20 | 0.4998 |
cosine_precision@50 | 0.3918 |
cosine_precision@100 | 0.3112 |
cosine_precision@150 | 0.2652 |
cosine_precision@200 | 0.2322 |
cosine_recall@1 | 0.0102 |
cosine_recall@20 | 0.1337 |
cosine_recall@50 | 0.2541 |
cosine_recall@100 | 0.3948 |
cosine_recall@150 | 0.4963 |
cosine_recall@200 | 0.5721 |
cosine_ndcg@1 | 0.7303 |
cosine_ndcg@20 | 0.5385 |
cosine_ndcg@50 | 0.4499 |
cosine_ndcg@100 | 0.4428 |
cosine_ndcg@150 | 0.4895 |
cosine_ndcg@200 | 0.5346 |
cosine_mrr@1 | 0.7303 |
cosine_mrr@20 | 0.8342 |
cosine_mrr@50 | 0.8343 |
cosine_mrr@100 | 0.8344 |
cosine_mrr@150 | 0.8344 |
cosine_mrr@200 | 0.8344 |
cosine_map@1 | 0.7303 |
cosine_map@20 | 0.3435 |
cosine_map@50 | 0.2378 |
cosine_map@100 | 0.2116 |
cosine_map@150 | 0.229 |
cosine_map@200 | 0.2478 |
cosine_map@500 | 0.2982 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 114,699 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 43 tokens
- mean: 65.45 tokens
- max: 116 tokens
- min: 34 tokens
- mean: 55.34 tokens
- max: 162 tokens
- Samples:
anchor positive A technical director or any of its synonyms requires a strong blend of technical expertise and leadership skills, including the ability to oversee technical operations, manage teams, and ensure the successful execution of technical projects while maintaining operational efficiency and innovation.
Job roles that require promoting health and safety include occupational health and safety specialists, safety managers, and public health educators, all of whom work to ensure safe and healthy environments in workplaces and communities.
A technical director or any of its synonyms requires a strong blend of technical expertise and leadership skills, including the ability to oversee technical operations, manage teams, and ensure the successful execution of technical projects while maintaining operational efficiency and innovation.
Job roles that require organizing rehearsals include directors, choreographers, and conductors in theater, dance, and music ensembles, who must efficiently plan and schedule practice sessions to prepare performers for a successful final performance.
A technical director or any of its synonyms requires a strong blend of technical expertise and leadership skills, including the ability to oversee technical operations, manage teams, and ensure the successful execution of technical projects while maintaining operational efficiency and innovation.
Job roles such as Health and Safety Managers, Environmental Health Officers, and Risk Management Specialists often require the skill of negotiating health and safety issues with third parties to ensure compliance and protection standards are met across different organizations and sites.
- Loss:
CachedGISTEmbedLoss
with these parameters:{'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.01, 'mini_batch_size': 32, 'margin_strategy': 'absolute', 'margin': 0.0}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 64per_device_eval_batch_size
: 128gradient_accumulation_steps
: 2num_train_epochs
: 5warmup_ratio
: 0.05log_on_each_node
: Falsefp16
: Truedataloader_num_workers
: 4ddp_find_unused_parameters
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 64per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 2eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.05warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Falselogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Truedataloader_num_workers
: 4dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size
: 0fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Trueddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | full_en_cosine_ndcg@200 |
---|---|---|---|
-1 | -1 | - | 0.4784 |
0.0011 | 1 | 9.119 | - |
0.1116 | 100 | 4.1469 | - |
0.2232 | 200 | 2.5294 | 0.5362 |
0.3348 | 300 | 2.3611 | - |
0.4464 | 400 | 2.192 | 0.5318 |
0.5580 | 500 | 2.0338 | - |
0.6696 | 600 | 1.9009 | 0.5383 |
0.7812 | 700 | 1.8404 | - |
0.8929 | 800 | 1.7692 | 0.5352 |
1.0045 | 900 | 1.6921 | - |
1.1161 | 1000 | 1.3861 | 0.5368 |
1.2277 | 1100 | 1.3863 | - |
1.3393 | 1200 | 1.3546 | 0.5259 |
1.4509 | 1300 | 1.373 | - |
1.5625 | 1400 | 1.3364 | 0.5303 |
1.6741 | 1500 | 1.2876 | - |
1.7857 | 1600 | 1.3094 | 0.5323 |
1.8973 | 1700 | 1.2784 | - |
2.0089 | 1800 | 1.2204 | 0.5330 |
2.1205 | 1900 | 0.9617 | - |
2.2321 | 2000 | 1.0004 | 0.5277 |
2.3438 | 2100 | 0.9694 | - |
2.4554 | 2200 | 0.9843 | 0.5356 |
2.5670 | 2300 | 0.9743 | - |
2.6786 | 2400 | 0.9252 | 0.5320 |
2.7902 | 2500 | 0.9272 | - |
2.9018 | 2600 | 0.9279 | 0.5333 |
3.0134 | 2700 | 0.857 | - |
3.125 | 2800 | 0.7313 | 0.5300 |
3.2366 | 2900 | 0.7103 | - |
3.3482 | 3000 | 0.7187 | 0.5319 |
3.4598 | 3100 | 0.7067 | - |
3.5714 | 3200 | 0.7157 | 0.5369 |
3.6830 | 3300 | 0.7113 | - |
3.7946 | 3400 | 0.7013 | 0.5341 |
3.9062 | 3500 | 0.6903 | - |
4.0179 | 3600 | 0.6462 | 0.5335 |
4.1295 | 3700 | 0.5162 | - |
4.2411 | 3800 | 0.524 | 0.5352 |
4.3527 | 3900 | 0.5303 | - |
4.4643 | 4000 | 0.5269 | 0.5341 |
4.5759 | 4100 | 0.4824 | - |
4.6875 | 4200 | 0.5222 | 0.5342 |
4.7991 | 4300 | 0.5104 | - |
4.9107 | 4400 | 0.5002 | 0.5346 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 4.1.0
- Transformers: 4.51.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.6.0
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}