Jellywibble's picture
Update README.md
71e3645
|
raw
history blame
3.16 kB
metadata
license: cc-by-nc-4.0
language:
  - en
pipeline_tag: text-classification
tags:
  - pytorch
  - reward_model
  - transformers
  - RLHF

This is part of the Chai reward-model series, using the GPT2 architecture with a classification head, optimising for a user accepting the completion generated by the base model.

Its training dataset consists of purely user-generated content retry_and_continue_50m_reward_model, where a user has the option to decline the generated response via the retry button or end the conversation.

Model details

  • Developed by Chai Research
  • Model type: Transformer-based Classification Model
  • Language: English
  • License: cc-by-nc-4.0
  • Contact: for general correspondence, please email hello@chai-research.com

Uses and limitations

Intended use

This reward model was developed primarily for commercial purposes. It learns an inner representation of response quality rated by humans that can be used to conduct best-of-N sampling and Reinforcement Leanring with the PPO framework.

In addition to scientific uses, you may also further fine-tune and adapt this reward model for deployment, as long as your use is in accordance with the cc-by-nc-4.0 license, i.e. non-commercial use. This model works with the Transformers Library. If you decide to this pre-trained reward model as a basis for your fine-tuned model, please note that you need to conduct your own risk and bias assessment.

Out-of-scope use

This reward model is not intended for deployment as-is. It is not a product and cannot be used for human-facing interactions without supervision.

This model has not been optimised for common reward-model objectives such as harmfulness, truthfulness and helpfulness, it is only trained based on user actions present on the Chai mobile app platform. Therefore, this model will not rank responses appropriately when evaluating on common open-sourced datasets. All base model responses within the training data were generated using an in-house variant of GPT-J, therefore the model performance may degrade when the input is generated using other language models.

How to use

This reward model can be loaded using the AutoModelForSequenceClassification functionality, with a GPT2 tokenizer where the pad_token_id is set to the EOS token id, padding sides need to be set according to the configurations used during model training.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForSequenceClassification.from_pretrained("ChaiML/gpt2_base_retry_and_continue_5m_reward_model")
tokenizer.pad_token_id = 50256
tokenizer.truncation_side = ‘left’
tokenizer.padding_side = ‘right’
tokens = self.eval_tokenizer(candidates, return_tensors='pt', return_attention_mask=True, padding='longest', truncation=True, max_length=256)
reward = model(**tokens).logits