--- license: cc-by-nc-sa-4.0 tags: - japanese - instruction-tuning - preference-optimization - quantized - causal-lm - axolotl - orpo - trl - exllamav2 - exl2 - 8-bit precision - text-generation-inference - Inference Endpoints - ELYZA-tasks-100 - Japanese MT-Bench - japanese-answer language: - ja model_name: japanese-answer-13b-8bit base_model: llm-jp/llm-jp-3-13b datasets: - Aratako/HelpSteer2-Preferences-formatted - Aratako/Magpie-Tanuki-Instruction-Selected-Evolved-26.5k - Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered - Aratako/Open-Platypus-Japanese-masked-formatted - Aratako/Self-Instruct-Qwen2.5-72B-Instruct-60k - Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k - Aratako/aya-ja-evol-instruct-calm3-dpo-masked-formatted - Aratako/iterative-dpo-data-for-ORPO-iter3 - Aratako/iterative-dpo-data-for-SimPO-iter2 - Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted - Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered - Aratako/magpie-ultra-v0.1-formatted - Aratako/orca-agentinstruct-1M-v1-selected - DeL-TaiseiOzaki/Tengentoppa-sft-qwen2.5-32b-reasoning-100k - cl-nagoya/auto-wiki-qa - ichikara-instruction - kanhatakeyama/ramdom-to-fixed-multiturn-Calm3 - kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja - llm-jp/magpie-sft-v1.0 - saillab/alpaca-japanese-cleaned - tokutsu/japanese-tasks1000 quantized_by: exllamav2 metrics: - elyza/ELYZA-tasks-100 - Stability-AI/japanese_mt_bench --- # Japanese-Answer 13B 8bit Instruction-tuned and Preference-optimized Japanese LLM based on **LLM-JP-3 13B** pre-trained model, quantized to **8bit (EXL2)** using **ExLlamaV2**. Trained on a mix of Japanese instruction and preference datasets. ## Benchmarks ### ELYZA-tasks-100 | Model | Avg. Score (1–5) | |------------------------------------------|------------------| | This Model (13B 8bit, SFT+PO) | **3.69** | - Score is based on automatic evaluation with GPT-3.5-Turbo. - Task set is based on [ELYZA-tasks-100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100). ### Japanese MT-Bench | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | Avg. (1-10) | |--|--|--|--|--|--|--|--|--| | 3.20 | 7.05 | 9.60 | 3.00 | 6.00 | 6.90 | 8.65 | 7.15 | **6.44** | - **Average score**: **6.44 / 10** - Evaluated using **GPT-4** as the judge (single-answer grading mode). - Based on the official [Japanese MT-Bench](https://github.com/Stability-AI/FastChat). ## What's special - Based on [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) (pre-trained model). - SFT + Preference Optimization (PO) applied. - Quantized using [ExLlamaV2](https://github.com/turboderp/exllamav2) to **EXL2 8.0bpw**. - Focused on practical improvements in Japanese instruction-following and QA. ## Usage This model is compatible with ExLlamaV2 for fast inference on quantized models (EXL2 format). Below is a minimal guide to install, download, and run inference using ExLlamaV2. ### 0. Hardware Requirements This model is quantized in EXL2 format and optimized for ExLlamaV2 to enable fast, memory-efficient inference. However, it requires an NVIDIA GPU with **Ampere architecture or newer**, due to reliance on FlashAttention and low-bit optimizations. Supported GPUs include: - Ampere: A100, A10, RTX 30 series - Ada Lovelace: L4 - Hopper: H100 GPUs like **T4** or **V100** are **not supported**, and attempting to use this model on such GPUs may result in runtime errors due to FlashAttention incompatibility. ### 1. Installation ```bash git clone -b v0.2.6 https://github.com/turboderp/exllamav2.git git clone https://huggingface.co/spaces/tokutsu/exllamav2_patch cd exllamav2 patch -p1 < ../exllamav2_patch/hf.py.patch # to support unigram tokenizer pip install -r requirements.txt pip install . ``` ### 2. Download the model ```bash pip install huggingface_hub huggingface-cli download tokutsu/japanese-answer-13b-8bit \ --local-dir ./model --include "*.safetensors" "*.json" "*.txt" ``` ### 3. Inference ```python from exllamav2 import ExLlamaV2, ExLlamaV2Cache, ExLlamaV2Config, ExLlamaV2Tokenizer from exllamav2.generator import ExLlamaV2DynamicGenerator, ExLlamaV2Sampler # Load model model_path = "./model" config = ExLlamaV2Config(model_path) model = ExLlamaV2(config) tokenizer = ExLlamaV2Tokenizer(config) cache = ExLlamaV2Cache(model, lazy=True) model.load_autosplit(cache) # Setup generator generator = ExLlamaV2DynamicGenerator(model=model, tokenizer=tokenizer, cache=cache) gen_settings = ExLlamaV2Sampler.Settings.greedy() # Prompt prompt = """次の指示に答えて下さい。 ### 指示 仕事の熱意を取り戻すためのアイデアを5つ挙げてください。 ### 解答 """ # Generate output = generator.generate( prompt=[prompt], max_new_tokens=1536, settings=gen_settings, stop_conditions=[tokenizer.eos_token_id], encode_special_tokens=True, ) print(output[0]) ``` ## Example Output Here is an example of what the output would look like (using a task from the **ELYZA-tasks-100** dataset): ```plaintext 次の指示に答えて下さい。 ### 指示 仕事の熱意を取り戻すためのアイデアを5つ挙げてください。 ### 解答 1. **目標設定の見直し**: 現在の業務目標が自分にとって意義深いものであるか確認し、必要に応じて再定義します。例えば、短期的な成果だけでなく、長期的なキャリアパスを見据えた目標設定を行うことで、モチベーションが向上します。 2. **スキルアップの機会を探す**: 新しい技術や知識を習得することで、自己成長を感じることができます。オンラインコースやワークショップに参加したり、同僚との勉強会を開催したりすることで、刺激を受けることができます。 3. **ワークライフバランスの改善**: 過度な労働時間を見直し、プライベート時間を確保することで、心身ともにリフレッシュできます。例えば、週に一度の完全オフデーを作ることや、趣味活動を再開することが有効です。 4. **チームビルディング活動の実施**: 同僚との絆を深めることで、職場環境がより楽しくなります。定期的なランチミーティングやスポーツイベントなどを企画することで、コミュニケーションが活発になり、互いに支え合う関係が築けます。 5. **外部からのフィードバックを求める**: 第三者からの客観的な意見を取り入れることで、自己評価を見直すことができます。メンターシッププログラムに参加したり、外部コンサルタントからのアドバイスを受けたりするのも一つの方法です。 ``` ## Training Details ### Datasets #### (1) SFT | Dataset name | (Derived from) | |--|--| | [Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered](https://huggingface.co/datasets/Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered) | | | [Aratako/Open-Platypus-Japanese-masked-formatted](https://huggingface.co/datasets/Aratako/Open-Platypus-Japanese-masked-formatted) | [weblab-GENIAC/Open-Platypus-Japanese-masked](https://huggingface.co/datasets/weblab-GENIAC/Open-Platypus-Japanese-masked), [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) | | [Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k](https://huggingface.co/datasets/Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k) | [Aratako/Synthetic-JP-EN-Coding-Dataset-801k](https://huggingface.co/datasets/Aratako/Synthetic-JP-EN-Coding-Dataset-801k), [Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k](https://huggingface.co/datasets/Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k) | | [Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted](https://huggingface.co/datasets/Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted) | [DeL-TaiseiOzaki/Tengentoppa-sft-qwen2.5-32b-reasoning-100k](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-qwen2.5-32b-reasoning-100k) | | [Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered](https://huggingface.co/datasets/Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered) | [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-llama-nemotron-70b-100k](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-llama-nemotron-70b-100k) | | [Aratako/magpie-ultra-v0.1-formatted](https://huggingface.co/datasets/Aratako/magpie-ultra-v0.1-formatted) | [argilla/magpie-ultra-v0.1](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1) | | [Aratako/orca-agentinstruct-1M-v1-selected](https://huggingface.co/datasets/Aratako/orca-agentinstruct-1M-v1-selected) | [microsoft/orca-agentinstruct-1M-v1](https://huggingface.co/datasets/microsoft/orca-agentinstruct-1M-v1) | | [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/) | | | [kanhatakeyama/ramdom-to-fixed-multiturn-Calm3](https://huggingface.co/datasets/kanhatakeyama/ramdom-to-fixed-multiturn-Calm3) | | | [kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja](https://huggingface.co/datasets/kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja) | | | tokutsu/japanese-tasks1000 (currently not planned for release) | [elyza/ELYZA-tasks-100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100) | #### (2) ORPO | Dataset name | (Derived from) | |--|--| | [Aratako/HelpSteer2-Preferences-formatted](https://huggingface.co/datasets/Aratako/HelpSteer2-Preferences-formatted) | [nvidia/HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) | | [Aratako/Magpie-Tanuki-Instruction-Selected-Evolved-26.5k](https://huggingface.co/datasets/Aratako/Magpie-Tanuki-Instruction-Selected-Evolved-26.5k) | | | [Aratako/Self-Instruct-Qwen2.5-72B-Instruct-60k](https://huggingface.co/datasets/Aratako/Self-Instruct-Qwen2.5-72B-Instruct-60k) | | | [Aratako/aya-ja-evol-instruct-calm3-dpo-masked-formatted](https://huggingface.co/datasets/Aratako/aya-ja-evol-instruct-calm3-dpo-masked-formatted) | [weblab-GENIAC/aya-ja-evol-instruct-calm3-dpo-masked](https://huggingface.co/datasets/weblab-GENIAC/aya-ja-evol-instruct-calm3-dpo-masked) | | [Aratako/iterative-dpo-data-for-ORPO-iter3](https://huggingface.co/datasets/Aratako/iterative-dpo-data-for-ORPO-iter3) | | | [Aratako/iterative-dpo-data-for-SimPO-iter2](https://huggingface.co/datasets/Aratako/iterative-dpo-data-for-SimPO-iter2) | | | [DeL-TaiseiOzaki/Tengentoppa-sft-qwen2.5-32b-reasoning-100k](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-qwen2.5-32b-reasoning-100k) | | | [llm-jp/magpie-sft-v1.0](https://huggingface.co/datasets/llm-jp/magpie-sft-v1.0) | | | [saillab/alpaca-japanese-cleaned](https://huggingface.co/datasets/saillab/alpaca-japanese-cleaned) | [Alpaca-52K](https://github.com/tatsu-lab/stanford_alpaca) | #### (3) SimPO/CPO | Dataset name | (Derived from) | |--|--| | [Aratako/HelpSteer2-Preferences-formatted](https://huggingface.co/datasets/Aratako/HelpSteer2-Preferences-formatted) | [nvidia/HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) | | [Aratako/Magpie-Tanuki-Instruction-Selected-Evolved-26.5k](https://huggingface.co/datasets/Aratako/Magpie-Tanuki-Instruction-Selected-Evolved-26.5k) | | | [Aratako/aya-ja-evol-instruct-calm3-dpo-masked-formatted](https://huggingface.co/datasets/Aratako/aya-ja-evol-instruct-calm3-dpo-masked-formatted) | [weblab-GENIAC/aya-ja-evol-instruct-calm3-dpo-masked](https://huggingface.co/datasets/weblab-GENIAC/aya-ja-evol-instruct-calm3-dpo-masked) | | [Aratako/iterative-dpo-data-for-SimPO-iter2](https://huggingface.co/datasets/Aratako/iterative-dpo-data-for-SimPO-iter2) | | | [cl-nagoya/auto-wiki-qa](https://huggingface.co/datasets/cl-nagoya/auto-wiki-qa) | [hpprc/jawiki](https://huggingface.co/datasets/hpprc/jawiki) , [Wikipedia](https://dumps.wikimedia.org/) | | [llm-jp/magpie-sft-v1.0](https://huggingface.co/datasets/llm-jp/magpie-sft-v1.0) | | ### Models #### (1) Used for preference generation: | Model name | |--| | [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) (Built with Qwen) | | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) (Built with Llama) | #### (2) Used in the datasets above: | Model name | |--| | [AIDC-AI/Marco-o1](https://huggingface.co/AIDC-AI/Marco-o1) | | [Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1](https://huggingface.co/Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1) | | [Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2](https://huggingface.co/Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2) | | Google Cloud Translation | | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Built with Qwen) | | [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) (Built with Qwen) | | [Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8) (Built with Qwen) | | [WizardLM 8x22b](https://github.com/nlpxucan/WizardLM) | | [cl-nagoya/ruri-large](https://huggingface.co/cl-nagoya/ruri-large) | | [cyberagent/calm3-22b-chat](https://huggingface.co/cyberagent/calm3-22b-chat) | | [meta-llama/Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) (Built with Llama) | | [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (Built with Llama) | | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) (Built with Llama) | | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) (Built with Llama) | | [meta-llama/Llama-Guard-3-8B](https://huggingface.co/meta-llama/Llama-Guard-3-8B) (Built with Llama) | | [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) | | [mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) | | [nvidia/Nemotron-4-340B-Instruct](https://huggingface.co/nvidia/Nemotron-4-340B-Instruct) | | [team-hatakeyama-phase2/Tanuki-8x8B-dpo-v1.0-GPTQ-8bit](https://huggingface.co/team-hatakeyama-phase2/Tanuki-8x8B-dpo-v1.0-GPTQ-8bit) | | [team-hatakeyama-phase2/tanuki-8B-exp007](https://huggingface.co/team-hatakeyama-phase2/tanuki-8B-exp007) | | [tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1) | | [weblab-GENIAC/Tanuki-8B-dpo-v1.0](https://huggingface.co/weblab-GENIAC/Tanuki-8B-dpo-v1.0) | ### Libraries | Method | Library names | |--|--| | SFT | Axolotl, TRL, Unsloth | | ORPO, SimPO/CPO | Axolotl | | Quantization | ExLlamaV2 (EXL2) | ## License - CC BY-NC-SA 4.0 - This model's license is described in the root `LICENSE` file. - For third-party dependencies, please refer to the `LICENSES/` directory. ## Acknowledgements - Special thanks to all developers and researchers whose prior projects made this work possible.