pj-mathematician's picture
Add files using upload-large-folder tool
1155584 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:114699
  - loss:CachedGISTEmbedLoss
base_model: BAAI/bge-large-en-v1.5
widget:
  - source_sentence: >-
      For roles such as 'physiotherapist', 'neuromusculoskeletal
      physiotherapist', 'osteopath', and 'chiropractor', the skills needed
      include a deep understanding of human anatomy and physiology, strong
      diagnostic skills, and the ability to apply manual therapy techniques to
      treat musculoskeletal issues. Additionally, effective communication skills
      are crucial for explaining treatments and exercises to patients, while
      adaptability and problem-solving skills are essential for tailoring
      treatments to individual patient needs.
    sentences:
      - >-
        Job roles such as insulation installers, HVAC technicians, and
        construction engineers require knowledge of various types and
        characteristics of insulation materials to effectively reduce heat
        transfer and improve energy efficiency in buildings and systems.
        Understanding the typology of insulation materials, including their
        thermal properties, durability, and environmental impact, is crucial for
        these professionals to select the most appropriate materials for
        specific applications.
      - >-
        Job roles such as Contract Managers, Legal Analysts, and Compliance
        Officers require the skill of reviewing or auditing completed contracts
        to ensure legal accuracy, compliance with regulations, and alignment
        with organizational goals.
      - >-
        Job roles that require skills in dealing with emergency care situations
        include emergency medical technicians (EMTs), paramedics, and emergency
        room nurses or doctors, all of whom must quickly and effectively manage
        critical health situations to save lives.
  - source_sentence: >-
      Bus drivers, including those operating in various sectors like public
      transit, intercity, private, or school services, need strong driving
      skills, knowledge of traffic laws, and the ability to operate safely in
      diverse conditions. Additionally, effective communication skills and the
      ability to handle passenger inquiries and emergencies are crucial.
    sentences:
      - >-
        Job roles that require the skill to calibrate electronic instruments
        include calibration technicians, quality control engineers, and
        instrumentation specialists. These professionals ensure the accuracy and
        reliability of various electronic devices and systems across different
        industries such as manufacturing, aerospace, and automotive.
      - >-
        Job roles such as Building Engineer, Architect, and Construction
        Specialist require skills in designing, engineering, or developing
        air-tight building structures to ensure energy efficiency and
        environmental control within the building.
      - >-
        Job roles such as customer service representatives, flight attendants,
        and hotel concierges require a strong focus on passengers or customers,
        ensuring their needs and comfort are prioritized to provide excellent
        service and support.
  - source_sentence: >-
      A mine surveyor, also known as a mining surveyor or mine planning
      surveyor, requires expertise in geomatics and mining engineering to
      accurately map and plan mine operations, ensuring safety and efficiency.
      They must also possess strong analytical skills and the ability to use
      specialized software for creating detailed mine plans and maintaining
      accurate records.
    sentences:
      - >-
        Job roles such as data analysts, business analysts, and financial
        analysts require the skill to present reports or prepare statistical
        reports, as they often need to communicate complex data insights clearly
        and effectively to stakeholders.
      - >-
        Job roles that require monitoring flour unloading equipment include
        Quality Control Technicians, Process Operators, and Mill Supervisors,
        who ensure the efficient and safe operation of flour processing systems
        and the proper unloading of flour from transport vehicles.
      - >-
        Job roles that require skills in the manufacturing of made-up textile
        articles include textile production managers, machinery operators, and
        quality control inspectors, all of whom utilize specific technology and
        machinery to produce finished textile products such as clothing, home
        textiles, and industrial fabrics.
  - source_sentence: >-
      An insulation supervisor, regardless of the specific type of insulation
      material or installation area, requires strong project management skills,
      knowledge of building codes and safety regulations, and expertise in
      insulation techniques to oversee the installation process effectively and
      ensure quality standards are met.
    sentences:
      - >-
        Job roles that require skills in energy efficiency, such as promoting
        energy efficiency or efficient energy use, include Energy Managers,
        Sustainability Specialists, and Building Engineers, who focus on
        reducing energy consumption and improving energy use in various
        settings. Additionally, roles like Battery Technicians or Engineers
        involve battery benchmarking to enhance energy storage and efficiency in
        technological devices and systems.
      - >-
        The skill of applying or installing waterproofing and damp-proofing
        membranes is primarily required by construction workers such as
        waterproofing specialists, roofers, and building envelope technicians,
        who use these membranes to prevent water damage in buildings and
        structures.
      - >-
        Job roles such as laboratory technicians, chemists, and materials
        scientists require skills in laboratory techniques, including electronic
        and thermic methods, gas chromatography, and gravimetric analysis, to
        conduct precise experiments and analyze materials. These professionals
        must apply natural science techniques and use various lab techniques to
        ensure accurate and reliable results in their research or quality
        control processes.
  - source_sentence: >-
      For roles such as import/export manager, graduate export manager, senior
      export manager, and other related positions in meat and meat products, the
      key skills include a strong understanding of international trade
      regulations, meat product knowledge, customs compliance, and excellent
      negotiation and communication skills to manage global supply chains
      effectively. Additionally, proficiency in relevant trade software and
      languages can be highly beneficial.
    sentences:
      - >-
        Job roles that require skills such as managing staff, coordinating
        employees, and performing HR activities include Human Resources
        Managers, Team Leaders, Supervisors, and Department Heads, all of whom
        are responsible for overseeing personnel, implementing HR policies, and
        ensuring efficient team operations.
      - >-
        Job roles such as Control Systems Engineer, Automation Engineer, and
        Systems Designer require skills in designing, planning, and developing
        control systems to manage and optimize the performance of various
        technological processes and machinery. These professionals are tasked
        with creating efficient and reliable systems that can operate
        autonomously or with minimal human intervention.
      - >-
        Job roles such as Performance Analyst, Quality Assurance Engineer, and
        Test Manager require skills in conducting performance measurement and
        organizing or managing conversion testing to ensure software and systems
        meet performance standards and function correctly in real-world
        scenarios.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@20
  - cosine_accuracy@50
  - cosine_accuracy@100
  - cosine_accuracy@150
  - cosine_accuracy@200
  - cosine_precision@1
  - cosine_precision@20
  - cosine_precision@50
  - cosine_precision@100
  - cosine_precision@150
  - cosine_precision@200
  - cosine_recall@1
  - cosine_recall@20
  - cosine_recall@50
  - cosine_recall@100
  - cosine_recall@150
  - cosine_recall@200
  - cosine_ndcg@1
  - cosine_ndcg@20
  - cosine_ndcg@50
  - cosine_ndcg@100
  - cosine_ndcg@150
  - cosine_ndcg@200
  - cosine_mrr@1
  - cosine_mrr@20
  - cosine_mrr@50
  - cosine_mrr@100
  - cosine_mrr@150
  - cosine_mrr@200
  - cosine_map@1
  - cosine_map@20
  - cosine_map@50
  - cosine_map@100
  - cosine_map@150
  - cosine_map@200
  - cosine_map@500
model-index:
  - name: SentenceTransformer based on BAAI/bge-large-en-v1.5
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: full en
          type: full_en
        metrics:
          - type: cosine_accuracy@1
            value: 0.7302631578947368
            name: Cosine Accuracy@1
          - type: cosine_accuracy@20
            value: 0.993421052631579
            name: Cosine Accuracy@20
          - type: cosine_accuracy@50
            value: 0.9967105263157895
            name: Cosine Accuracy@50
          - type: cosine_accuracy@100
            value: 1
            name: Cosine Accuracy@100
          - type: cosine_accuracy@150
            value: 1
            name: Cosine Accuracy@150
          - type: cosine_accuracy@200
            value: 1
            name: Cosine Accuracy@200
          - type: cosine_precision@1
            value: 0.7302631578947368
            name: Cosine Precision@1
          - type: cosine_precision@20
            value: 0.4998355263157894
            name: Cosine Precision@20
          - type: cosine_precision@50
            value: 0.39184210526315794
            name: Cosine Precision@50
          - type: cosine_precision@100
            value: 0.3111842105263158
            name: Cosine Precision@100
          - type: cosine_precision@150
            value: 0.2652412280701754
            name: Cosine Precision@150
          - type: cosine_precision@200
            value: 0.232171052631579
            name: Cosine Precision@200
          - type: cosine_recall@1
            value: 0.010227350724729817
            name: Cosine Recall@1
          - type: cosine_recall@20
            value: 0.13368254620254577
            name: Cosine Recall@20
          - type: cosine_recall@50
            value: 0.2541249933594102
            name: Cosine Recall@50
          - type: cosine_recall@100
            value: 0.3948435268881245
            name: Cosine Recall@100
          - type: cosine_recall@150
            value: 0.49626849018850344
            name: Cosine Recall@150
          - type: cosine_recall@200
            value: 0.5720837677245543
            name: Cosine Recall@200
          - type: cosine_ndcg@1
            value: 0.7302631578947368
            name: Cosine Ndcg@1
          - type: cosine_ndcg@20
            value: 0.5384654647855256
            name: Cosine Ndcg@20
          - type: cosine_ndcg@50
            value: 0.44986527953229877
            name: Cosine Ndcg@50
          - type: cosine_ndcg@100
            value: 0.44277699637488865
            name: Cosine Ndcg@100
          - type: cosine_ndcg@150
            value: 0.4895063673734854
            name: Cosine Ndcg@150
          - type: cosine_ndcg@200
            value: 0.5346148440105628
            name: Cosine Ndcg@200
          - type: cosine_mrr@1
            value: 0.7302631578947368
            name: Cosine Mrr@1
          - type: cosine_mrr@20
            value: 0.8341772399749373
            name: Cosine Mrr@20
          - type: cosine_mrr@50
            value: 0.8343338815789473
            name: Cosine Mrr@50
          - type: cosine_mrr@100
            value: 0.8343905966424682
            name: Cosine Mrr@100
          - type: cosine_mrr@150
            value: 0.8343905966424682
            name: Cosine Mrr@150
          - type: cosine_mrr@200
            value: 0.8343905966424682
            name: Cosine Mrr@200
          - type: cosine_map@1
            value: 0.7302631578947368
            name: Cosine Map@1
          - type: cosine_map@20
            value: 0.3434603918412553
            name: Cosine Map@20
          - type: cosine_map@50
            value: 0.23779270403918282
            name: Cosine Map@50
          - type: cosine_map@100
            value: 0.21161540263537876
            name: Cosine Map@100
          - type: cosine_map@150
            value: 0.22899252179487295
            name: Cosine Map@150
          - type: cosine_map@200
            value: 0.24784282323083537
            name: Cosine Map@200
          - type: cosine_map@500
            value: 0.298154972004029
            name: Cosine Map@500

Job-Skill matching fintuned BAAI/bge-large-en-v1.5

Top performing model on TalentCLEF 2025 Task B. Use it for job title <-> skill set matching

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-large-en-v1.5
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pj-mathematician/JobSkillBGE-large-en-v1.5")
# Run inference
sentences = [
    'For roles such as import/export manager, graduate export manager, senior export manager, and other related positions in meat and meat products, the key skills include a strong understanding of international trade regulations, meat product knowledge, customs compliance, and excellent negotiation and communication skills to manage global supply chains effectively. Additionally, proficiency in relevant trade software and languages can be highly beneficial.',
    'Job roles such as Performance Analyst, Quality Assurance Engineer, and Test Manager require skills in conducting performance measurement and organizing or managing conversion testing to ensure software and systems meet performance standards and function correctly in real-world scenarios.',
    'Job roles that require skills such as managing staff, coordinating employees, and performing HR activities include Human Resources Managers, Team Leaders, Supervisors, and Department Heads, all of whom are responsible for overseeing personnel, implementing HR policies, and ensuring efficient team operations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7303
cosine_accuracy@20 0.9934
cosine_accuracy@50 0.9967
cosine_accuracy@100 1.0
cosine_accuracy@150 1.0
cosine_accuracy@200 1.0
cosine_precision@1 0.7303
cosine_precision@20 0.4998
cosine_precision@50 0.3918
cosine_precision@100 0.3112
cosine_precision@150 0.2652
cosine_precision@200 0.2322
cosine_recall@1 0.0102
cosine_recall@20 0.1337
cosine_recall@50 0.2541
cosine_recall@100 0.3948
cosine_recall@150 0.4963
cosine_recall@200 0.5721
cosine_ndcg@1 0.7303
cosine_ndcg@20 0.5385
cosine_ndcg@50 0.4499
cosine_ndcg@100 0.4428
cosine_ndcg@150 0.4895
cosine_ndcg@200 0.5346
cosine_mrr@1 0.7303
cosine_mrr@20 0.8342
cosine_mrr@50 0.8343
cosine_mrr@100 0.8344
cosine_mrr@150 0.8344
cosine_mrr@200 0.8344
cosine_map@1 0.7303
cosine_map@20 0.3435
cosine_map@50 0.2378
cosine_map@100 0.2116
cosine_map@150 0.229
cosine_map@200 0.2478
cosine_map@500 0.2982

Training Details

Training Dataset

Unnamed Dataset

  • Size: 114,699 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 43 tokens
    • mean: 65.45 tokens
    • max: 116 tokens
    • min: 34 tokens
    • mean: 55.34 tokens
    • max: 162 tokens
  • Samples:
    anchor positive
    A technical director or any of its synonyms requires a strong blend of technical expertise and leadership skills, including the ability to oversee technical operations, manage teams, and ensure the successful execution of technical projects while maintaining operational efficiency and innovation. Job roles that require promoting health and safety include occupational health and safety specialists, safety managers, and public health educators, all of whom work to ensure safe and healthy environments in workplaces and communities.
    A technical director or any of its synonyms requires a strong blend of technical expertise and leadership skills, including the ability to oversee technical operations, manage teams, and ensure the successful execution of technical projects while maintaining operational efficiency and innovation. Job roles that require organizing rehearsals include directors, choreographers, and conductors in theater, dance, and music ensembles, who must efficiently plan and schedule practice sessions to prepare performers for a successful final performance.
    A technical director or any of its synonyms requires a strong blend of technical expertise and leadership skills, including the ability to oversee technical operations, manage teams, and ensure the successful execution of technical projects while maintaining operational efficiency and innovation. Job roles such as Health and Safety Managers, Environmental Health Officers, and Risk Management Specialists often require the skill of negotiating health and safety issues with third parties to ensure compliance and protection standards are met across different organizations and sites.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01, 'mini_batch_size': 32, 'margin_strategy': 'absolute', 'margin': 0.0}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 128
  • gradient_accumulation_steps: 2
  • num_train_epochs: 5
  • warmup_ratio: 0.05
  • log_on_each_node: False
  • fp16: True
  • dataloader_num_workers: 4
  • ddp_find_unused_parameters: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: False
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: True
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss full_en_cosine_ndcg@200
-1 -1 - 0.4784
0.0011 1 9.119 -
0.1116 100 4.1469 -
0.2232 200 2.5294 0.5362
0.3348 300 2.3611 -
0.4464 400 2.192 0.5318
0.5580 500 2.0338 -
0.6696 600 1.9009 0.5383
0.7812 700 1.8404 -
0.8929 800 1.7692 0.5352
1.0045 900 1.6921 -
1.1161 1000 1.3861 0.5368
1.2277 1100 1.3863 -
1.3393 1200 1.3546 0.5259
1.4509 1300 1.373 -
1.5625 1400 1.3364 0.5303
1.6741 1500 1.2876 -
1.7857 1600 1.3094 0.5323
1.8973 1700 1.2784 -
2.0089 1800 1.2204 0.5330
2.1205 1900 0.9617 -
2.2321 2000 1.0004 0.5277
2.3438 2100 0.9694 -
2.4554 2200 0.9843 0.5356
2.5670 2300 0.9743 -
2.6786 2400 0.9252 0.5320
2.7902 2500 0.9272 -
2.9018 2600 0.9279 0.5333
3.0134 2700 0.857 -
3.125 2800 0.7313 0.5300
3.2366 2900 0.7103 -
3.3482 3000 0.7187 0.5319
3.4598 3100 0.7067 -
3.5714 3200 0.7157 0.5369
3.6830 3300 0.7113 -
3.7946 3400 0.7013 0.5341
3.9062 3500 0.6903 -
4.0179 3600 0.6462 0.5335
4.1295 3700 0.5162 -
4.2411 3800 0.524 0.5352
4.3527 3900 0.5303 -
4.4643 4000 0.5269 0.5341
4.5759 4100 0.4824 -
4.6875 4200 0.5222 0.5342
4.7991 4300 0.5104 -
4.9107 4400 0.5002 0.5346

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}