You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

This model is a fine-tuned version of the meta-llama/Meta-Llama-3.1-8B-Instruct model, specifically adapted for enhanced Korean text generation and question answering. It has been trained on the woojin0412/common dataset.

Features

This model offers the following key features.

  • Korean Text Generation: Optimized for generating coherent and contextually relevant text in the Korean language.
  • Instruction Following: Fine-tuned to understand and respond to user instructions effectively.
  • Question Answering: Capable of providing answers to questions posed in Korean.
  • Efficient Inference: Designed for efficient inference by utilizing 4-bit quantization and float16 precision.

How to Get Started with the Model

  1. Import huggingface library and login huggingface

This code imports the huggingface_hub library, which is used to interact with the Hugging Face Hub, a platform for sharing and storing machine learning models, datasets, and tokenizers. The huggingface_hub.login() function then prompts the user to log in to the Hugging Face Hub, typically requiring an API key. This login is often necessary for accessing private models or uploading models to the Hub. This initial block handles the authentication process with the Hugging Face Hub.

import huggingface_hub
huggingface_hub.login()
  1. Run korean text generation

This code segment focuses on loading and utilizing a pre-trained causal language model. It begins by importing necessary libraries from transformers and torch. It then specifies the model to be loaded from the Hugging Face Hub (loadModel = "woojin0412/common"). The code loads the model using AutoModelForCausalLM.from_pretrained, optimizing memory usage and setting data types for efficiency. It also loads the corresponding tokenizer. An example Korean input is defined, formatted into a chat-style message, and then converted into a prompt suitable for the model. The model generates a response using the model.generate function with specified generation parameters. Finally, the generated tokens are decoded back into a human-readable text response, cleaned up, and printed to the console. This block essentially demonstrates the process of loading a language model, preparing input, generating text, and displaying the output.

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Specify the model name
loadModel = "woojin0412/common"

# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained(loadModel, low_cpu_mem_usage=True, return_dict=True, torch_dtype=torch.float16, device_map= "auto")
model.eval() # Set the model to evaluation mode

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(loadModel, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Example input
input_text = "ํ•œ๊ตญ์–ด๋กœ '์•ˆ๋…•ํ•˜์„ธ์š”'๋ฅผ ์„ธ ๋ฒˆ ๋ฐ˜๋ณตํ•ด์„œ ๋งํ•ด์ค˜."

messages = [{"role": "user", "content": input_text}]

prompt_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.1,
    top_p=0.95,
    eos_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

if "assistant" in response:
    response = response.split("assistant")[-1].strip()
    
# Decode and print the output
print(response.strip())
Downloads last month
8
Safetensors
Model size
8.03B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for woojin0412/common

Finetuned
(1583)
this model