|
+ deepspeed |
|
[rank3]:[W528 18:45:01.855919414 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
[rank7]:[W528 18:45:01.899177527 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
[rank4]:[W528 18:45:01.904355763 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
[rank6]:[W528 18:45:01.907498871 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
[rank2]:[W528 18:45:01.930201749 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
[rank1]:[W528 18:45:01.952624365 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
[rank0]:[W528 18:45:01.981560238 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
[rank5]:[W528 18:45:01.982132308 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/config.json |
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
Model config Qwen2Config { |
|
"_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-0.5B", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"hidden_act": "silu", |
|
"hidden_size": 1024, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 2816, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 21, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 16, |
|
"num_hidden_layers": 24, |
|
"num_key_value_heads": 16, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": 32768, |
|
"tie_word_embeddings": true, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.49.0", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 151936 |
|
} |
|
|
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/model.safetensors |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Will use torch_dtype=torch.bfloat16 as defined in model's config object |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643 |
|
} |
|
|
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. |
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
loading file vocab.json |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file chat_template.jinja |
|
loading file vocab.json |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file vocab.json |
|
loading file chat_template.jinja |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file chat_template.jinja |
|
loading file vocab.json |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file chat_template.jinja |
|
loading file vocab.json |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file chat_template.jinja |
|
loading file vocab.json |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file chat_template.jinja |
|
loading file vocab.json |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file chat_template.jinja |
|
All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/Qwen1.5-0.5B. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-0.5B/generation_config.json |
|
Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151643, |
|
"max_new_tokens": 2048 |
|
} |
|
|
|
loading file vocab.json |
|
loading file merges.txt |
|
loading file tokenizer.json |
|
loading file added_tokens.json |
|
loading file special_tokens_map.json |
|
loading file tokenizer_config.json |
|
loading file chat_template.jinja |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/models/pretrained.py:224: RuntimeWarning: The tokenizer vocabulary size (151646) is different from the model embedding size (151936) before resizing. |
|
resize_tokenizer_embedding(tokenizer=tokenizer, model=model) |
|
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 151646. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
Detected CUDA files, patching ldflags |
|
Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
|
/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
|
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. |
|
warnings.warn( |
|
Building extension module fused_adam... |
|
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
|
Loading extension module fused_adam... |
|
Loading extension module fused_adam... |
|
Loading extension module fused_adam... |
|
Loading extension module fused_adam... |
|
Loading extension module fused_adam... |
|
Loading extension module fused_adam... |
|
Loading extension module fused_adam... |
|
Loading extension module fused_adam... |
|
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. |
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login |
|
wandb: Tracking run with wandb version 0.19.8 |
|
wandb: Run data is saved locally in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-2k/wandb/run-20250528_184512-1slp0ya4 |
|
wandb: Run `wandb offline` to turn off syncing. |
|
wandb: Syncing run qwen-0.5b-s3-Q1-2k |
|
wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment |
|
wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment/runs/1slp0ya4 |
|
Training 1/1 epoch: 0%| | 0/63 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
Training 1/1 epoch (loss 2.2122): 0%| | 0/63 [00:06<?, ?it/s]
Training 1/1 epoch (loss 2.2122): 2%|β | 1/63 [00:06<06:43, 6.50s/it]
Training 1/1 epoch (loss 2.1626): 2%|β | 1/63 [00:09<06:43, 6.50s/it]
Training 1/1 epoch (loss 2.1626): 3%|β | 2/63 [00:09<04:23, 4.33s/it]
Training 1/1 epoch (loss 2.2434): 3%|β | 2/63 [00:09<04:23, 4.33s/it]
Training 1/1 epoch (loss 2.2434): 5%|β | 3/63 [00:09<02:30, 2.51s/it]
Training 1/1 epoch (loss 2.2022): 5%|β | 3/63 [00:09<02:30, 2.51s/it]
Training 1/1 epoch (loss 2.2022): 6%|β | 4/63 [00:09<01:37, 1.65s/it]
Training 1/1 epoch (loss 2.1616): 6%|β | 4/63 [00:10<01:37, 1.65s/it]
Training 1/1 epoch (loss 2.1616): 8%|β | 5/63 [00:10<01:07, 1.17s/it]
Training 1/1 epoch (loss 2.1868): 8%|β | 5/63 [00:10<01:07, 1.17s/it]
Training 1/1 epoch (loss 2.1868): 10%|β | 6/63 [00:10<00:52, 1.08it/s]
Training 1/1 epoch (loss 2.1862): 10%|β | 6/63 [00:11<00:52, 1.08it/s]
Training 1/1 epoch (loss 2.1862): 11%|β | 7/63 [00:11<00:41, 1.36it/s]
Training 1/1 epoch (loss 2.3248): 11%|β | 7/63 [00:11<00:41, 1.36it/s]
Training 1/1 epoch (loss 2.3248): 13%|ββ | 8/63 [00:11<00:35, 1.56it/s]
Training 1/1 epoch (loss 2.0679): 13%|ββ | 8/63 [00:11<00:35, 1.56it/s]
Training 1/1 epoch (loss 2.0679): 14%|ββ | 9/63 [00:11<00:29, 1.83it/s]
Training 1/1 epoch (loss 1.9464): 14%|ββ | 9/63 [00:12<00:29, 1.83it/s]
Training 1/1 epoch (loss 1.9464): 16%|ββ | 10/63 [00:12<00:25, 2.06it/s]
Training 1/1 epoch (loss 2.0644): 16%|ββ | 10/63 [00:12<00:25, 2.06it/s]
Training 1/1 epoch (loss 2.0644): 17%|ββ | 11/63 [00:12<00:23, 2.26it/s]
Training 1/1 epoch (loss 2.1023): 17%|ββ | 11/63 [00:13<00:23, 2.26it/s]
Training 1/1 epoch (loss 2.1023): 19%|ββ | 12/63 [00:13<00:23, 2.13it/s]
Training 1/1 epoch (loss 2.0310): 19%|ββ | 12/63 [00:13<00:23, 2.13it/s]
Training 1/1 epoch (loss 2.0310): 21%|ββ | 13/63 [00:13<00:23, 2.11it/s]
Training 1/1 epoch (loss 2.1667): 21%|ββ | 13/63 [00:14<00:23, 2.11it/s]
Training 1/1 epoch (loss 2.1667): 22%|βββ | 14/63 [00:14<00:22, 2.18it/s]
Training 1/1 epoch (loss 1.9624): 22%|βββ | 14/63 [00:14<00:22, 2.18it/s]
Training 1/1 epoch (loss 1.9624): 24%|βββ | 15/63 [00:14<00:21, 2.20it/s]
Training 1/1 epoch (loss 2.1279): 24%|βββ | 15/63 [00:15<00:21, 2.20it/s]
Training 1/1 epoch (loss 2.1279): 25%|βββ | 16/63 [00:15<00:23, 2.03it/s]
Training 1/1 epoch (loss 2.0294): 25%|βββ | 16/63 [00:15<00:23, 2.03it/s]
Training 1/1 epoch (loss 2.0294): 27%|βββ | 17/63 [00:15<00:22, 2.05it/s]
Training 1/1 epoch (loss 1.9367): 27%|βββ | 17/63 [00:15<00:22, 2.05it/s]
Training 1/1 epoch (loss 1.9367): 29%|βββ | 18/63 [00:15<00:20, 2.16it/s]
Training 1/1 epoch (loss 1.9497): 29%|βββ | 18/63 [00:16<00:20, 2.16it/s]
Training 1/1 epoch (loss 1.9497): 30%|βββ | 19/63 [00:16<00:20, 2.15it/s]
Training 1/1 epoch (loss 1.9627): 30%|βββ | 19/63 [00:16<00:20, 2.15it/s]
Training 1/1 epoch (loss 1.9627): 32%|ββββ | 20/63 [00:16<00:20, 2.11it/s]
Training 1/1 epoch (loss 1.9898): 32%|ββββ | 20/63 [00:17<00:20, 2.11it/s]
Training 1/1 epoch (loss 1.9898): 33%|ββββ | 21/63 [00:17<00:18, 2.21it/s]
Training 1/1 epoch (loss 2.1427): 33%|ββββ | 21/63 [00:17<00:18, 2.21it/s]
Training 1/1 epoch (loss 2.1427): 35%|ββββ | 22/63 [00:17<00:17, 2.38it/s]
Training 1/1 epoch (loss 1.9799): 35%|ββββ | 22/63 [00:18<00:17, 2.38it/s]
Training 1/1 epoch (loss 1.9799): 37%|ββββ | 23/63 [00:18<00:16, 2.37it/s]
Training 1/1 epoch (loss 2.0389): 37%|ββββ | 23/63 [00:18<00:16, 2.37it/s]
Training 1/1 epoch (loss 2.0389): 38%|ββββ | 24/63 [00:18<00:16, 2.41it/s]
Training 1/1 epoch (loss 1.9805): 38%|ββββ | 24/63 [00:18<00:16, 2.41it/s]
Training 1/1 epoch (loss 1.9805): 40%|ββββ | 25/63 [00:18<00:15, 2.47it/s]
Training 1/1 epoch (loss 2.0551): 40%|ββββ | 25/63 [00:19<00:15, 2.47it/s]
Training 1/1 epoch (loss 2.0551): 41%|βββββ | 26/63 [00:19<00:14, 2.52it/s]
Training 1/1 epoch (loss 2.0303): 41%|βββββ | 26/63 [00:19<00:14, 2.52it/s]
Training 1/1 epoch (loss 2.0303): 43%|βββββ | 27/63 [00:19<00:15, 2.38it/s]
Training 1/1 epoch (loss 1.8327): 43%|βββββ | 27/63 [00:20<00:15, 2.38it/s]
Training 1/1 epoch (loss 1.8327): 44%|βββββ | 28/63 [00:20<00:14, 2.40it/s]
Training 1/1 epoch (loss 2.0123): 44%|βββββ | 28/63 [00:20<00:14, 2.40it/s]
Training 1/1 epoch (loss 2.0123): 46%|βββββ | 29/63 [00:20<00:13, 2.50it/s]
Training 1/1 epoch (loss 1.9087): 46%|βββββ | 29/63 [00:20<00:13, 2.50it/s]
Training 1/1 epoch (loss 1.9087): 48%|βββββ | 30/63 [00:20<00:12, 2.57it/s]
Training 1/1 epoch (loss 2.0823): 48%|βββββ | 30/63 [00:21<00:12, 2.57it/s]
Training 1/1 epoch (loss 2.0823): 49%|βββββ | 31/63 [00:21<00:12, 2.53it/s]
Training 1/1 epoch (loss 2.0103): 49%|βββββ | 31/63 [00:21<00:12, 2.53it/s]
Training 1/1 epoch (loss 2.0103): 51%|βββββ | 32/63 [00:21<00:12, 2.55it/s]
Training 1/1 epoch (loss 2.0080): 51%|βββββ | 32/63 [00:22<00:12, 2.55it/s]
Training 1/1 epoch (loss 2.0080): 52%|ββββββ | 33/63 [00:22<00:11, 2.53it/s]
Training 1/1 epoch (loss 1.8523): 52%|ββββββ | 33/63 [00:22<00:11, 2.53it/s]
Training 1/1 epoch (loss 1.8523): 54%|ββββββ | 34/63 [00:22<00:11, 2.59it/s]
Training 1/1 epoch (loss 2.0902): 54%|ββββββ | 34/63 [00:22<00:11, 2.59it/s]
Training 1/1 epoch (loss 2.0902): 56%|ββββββ | 35/63 [00:22<00:11, 2.42it/s]
Training 1/1 epoch (loss 1.9712): 56%|ββββββ | 35/63 [00:23<00:11, 2.42it/s]
Training 1/1 epoch (loss 1.9712): 57%|ββββββ | 36/63 [00:23<00:11, 2.38it/s]
Training 1/1 epoch (loss 2.1606): 57%|ββββββ | 36/63 [00:23<00:11, 2.38it/s]
Training 1/1 epoch (loss 2.1606): 59%|ββββββ | 37/63 [00:23<00:10, 2.39it/s]
Training 1/1 epoch (loss 1.8062): 59%|ββββββ | 37/63 [00:24<00:10, 2.39it/s]
Training 1/1 epoch (loss 1.8062): 60%|ββββββ | 38/63 [00:24<00:11, 2.26it/s]
Training 1/1 epoch (loss 1.9797): 60%|ββββββ | 38/63 [00:24<00:11, 2.26it/s]
Training 1/1 epoch (loss 1.9797): 62%|βββββββ | 39/63 [00:24<00:11, 2.15it/s]
Training 1/1 epoch (loss 1.9174): 62%|βββββββ | 39/63 [00:25<00:11, 2.15it/s]
Training 1/1 epoch (loss 1.9174): 63%|βββββββ | 40/63 [00:25<00:10, 2.12it/s]
Training 1/1 epoch (loss 1.8677): 63%|βββββββ | 40/63 [00:25<00:10, 2.12it/s]
Training 1/1 epoch (loss 1.8677): 65%|βββββββ | 41/63 [00:25<00:09, 2.21it/s]
Training 1/1 epoch (loss 1.9840): 65%|βββββββ | 41/63 [00:25<00:09, 2.21it/s]
Training 1/1 epoch (loss 1.9840): 67%|βββββββ | 42/63 [00:25<00:08, 2.37it/s]
Training 1/1 epoch (loss 2.0601): 67%|βββββββ | 42/63 [00:26<00:08, 2.37it/s]
Training 1/1 epoch (loss 2.0601): 68%|βββββββ | 43/63 [00:26<00:08, 2.45it/s]
Training 1/1 epoch (loss 1.9732): 68%|βββββββ | 43/63 [00:26<00:08, 2.45it/s]
Training 1/1 epoch (loss 1.9732): 70%|βββββββ | 44/63 [00:26<00:07, 2.38it/s]
Training 1/1 epoch (loss 1.9440): 70%|βββββββ | 44/63 [00:27<00:07, 2.38it/s]
Training 1/1 epoch (loss 1.9440): 71%|ββββββββ | 45/63 [00:27<00:08, 2.25it/s]
Training 1/1 epoch (loss 1.9001): 71%|ββββββββ | 45/63 [00:27<00:08, 2.25it/s]
Training 1/1 epoch (loss 1.9001): 73%|ββββββββ | 46/63 [00:27<00:07, 2.41it/s]
Training 1/1 epoch (loss 1.8930): 73%|ββββββββ | 46/63 [00:27<00:07, 2.41it/s]
Training 1/1 epoch (loss 1.8930): 75%|ββββββββ | 47/63 [00:27<00:06, 2.51it/s]
Training 1/1 epoch (loss 2.0864): 75%|ββββββββ | 47/63 [00:28<00:06, 2.51it/s]
Training 1/1 epoch (loss 2.0864): 76%|ββββββββ | 48/63 [00:28<00:05, 2.55it/s]
Training 1/1 epoch (loss 1.8801): 76%|ββββββββ | 48/63 [00:28<00:05, 2.55it/s]
Training 1/1 epoch (loss 1.8801): 78%|ββββββββ | 49/63 [00:28<00:05, 2.55it/s]
Training 1/1 epoch (loss 1.9363): 78%|ββββββββ | 49/63 [00:29<00:05, 2.55it/s]
Training 1/1 epoch (loss 1.9363): 79%|ββββββββ | 50/63 [00:29<00:05, 2.37it/s]
Training 1/1 epoch (loss 1.9833): 79%|ββββββββ | 50/63 [00:29<00:05, 2.37it/s]
Training 1/1 epoch (loss 1.9833): 81%|ββββββββ | 51/63 [00:29<00:04, 2.49it/s]
Training 1/1 epoch (loss 1.9821): 81%|ββββββββ | 51/63 [00:29<00:04, 2.49it/s]
Training 1/1 epoch (loss 1.9821): 83%|βββββββββ | 52/63 [00:29<00:04, 2.59it/s]
Training 1/1 epoch (loss 1.8717): 83%|βββββββββ | 52/63 [00:30<00:04, 2.59it/s]
Training 1/1 epoch (loss 1.8717): 84%|βββββββββ | 53/63 [00:30<00:03, 2.68it/s]
Training 1/1 epoch (loss 1.9399): 84%|βββββββββ | 53/63 [00:30<00:03, 2.68it/s]
Training 1/1 epoch (loss 1.9399): 86%|βββββββββ | 54/63 [00:30<00:03, 2.60it/s]
Training 1/1 epoch (loss 2.0007): 86%|βββββββββ | 54/63 [00:31<00:03, 2.60it/s]
Training 1/1 epoch (loss 2.0007): 87%|βββββββββ | 55/63 [00:31<00:03, 2.60it/s]
Training 1/1 epoch (loss 1.7834): 87%|βββββββββ | 55/63 [00:31<00:03, 2.60it/s]
Training 1/1 epoch (loss 1.7834): 89%|βββββββββ | 56/63 [00:31<00:02, 2.48it/s]
Training 1/1 epoch (loss 1.9033): 89%|βββββββββ | 56/63 [00:31<00:02, 2.48it/s]
Training 1/1 epoch (loss 1.9033): 90%|βββββββββ | 57/63 [00:31<00:02, 2.60it/s]
Training 1/1 epoch (loss 1.9213): 90%|βββββββββ | 57/63 [00:32<00:02, 2.60it/s]
Training 1/1 epoch (loss 1.9213): 92%|ββββββββββ| 58/63 [00:32<00:01, 2.66it/s]
Training 1/1 epoch (loss 1.9180): 92%|ββββββββββ| 58/63 [00:32<00:01, 2.66it/s]
Training 1/1 epoch (loss 1.9180): 94%|ββββββββββ| 59/63 [00:32<00:01, 2.66it/s]
Training 1/1 epoch (loss 1.9246): 94%|ββββββββββ| 59/63 [00:33<00:01, 2.66it/s]
Training 1/1 epoch (loss 1.9246): 95%|ββββββββββ| 60/63 [00:33<00:01, 2.53it/s]
Training 1/1 epoch (loss 1.9254): 95%|ββββββββββ| 60/63 [00:33<00:01, 2.53it/s]
Training 1/1 epoch (loss 1.9254): 97%|ββββββββββ| 61/63 [00:33<00:00, 2.53it/s]
Training 1/1 epoch (loss 2.0030): 97%|ββββββββββ| 61/63 [00:33<00:00, 2.53it/s]
Training 1/1 epoch (loss 2.0030): 98%|ββββββββββ| 62/63 [00:33<00:00, 2.53it/s]
Training 1/1 epoch (loss 1.9616): 98%|ββββββββββ| 62/63 [00:34<00:00, 2.53it/s]
Training 1/1 epoch (loss 1.9616): 100%|ββββββββββ| 63/63 [00:34<00:00, 2.62it/s]
Training 1/1 epoch (loss 1.9616): 100%|ββββββββββ| 63/63 [00:34<00:00, 1.84it/s] |
|
tokenizer config file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-2k/tokenizer_config.json |
|
Special tokens file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-2k/special_tokens_map.json |
|
wandb: |
|
wandb: |
|
wandb: Run history: |
|
wandb: train/epoch ββββββββββββββββββββββ
β
β
β
βββββββββββββββ |
|
wandb: train/loss ββββββββββββββ
ββ
ββ
β
βββββββββββββββ
ββββββ |
|
wandb: train/lr ββββββββββββββββββββββββββββββββββββββββ |
|
wandb: train/step βββββββββββββββββββββββ
β
β
βββββββββββββββ |
|
wandb: |
|
wandb: Run summary: |
|
wandb: train/epoch 1 |
|
wandb: train/loss 1.96162 |
|
wandb: train/lr 1e-05 |
|
wandb: train/step 63 |
|
wandb: |
|
wandb: π View run qwen-0.5b-s3-Q1-2k at: https://wandb.ai/xtom/Inverse_Alignment/runs/1slp0ya4 |
|
wandb: βοΈ View project at: https://wandb.ai/xtom/Inverse_Alignment |
|
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) |
|
wandb: Find logs at: /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-0.5B/Qwen1.5-0.5B-s3-Q1-2k/wandb/run-20250528_184512-1slp0ya4/logs |
|
|