Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /model_doc_mpt.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

1.76 kB


	MPT
	Overview
	The MPT model was proposed by the MosaicML team and released with multiple sizes and finetuned variants. The MPT models is a series of open source and commercially usable LLMs pre-trained on 1T tokens.
	MPT models are GPT-style decoder-only transformers with several improvements: performance-optimized layer implementations, architecture changes that provide greater training stability, and the elimination of context length limits by replacing positional embeddings with ALiBi.

	MPT base: MPT base pre-trained models on next token prediction
	MPT instruct: MPT base models fine-tuned on instruction based tasks
	MPT storywriter: MPT base models fine-tuned for 2500 steps on 65k-token excerpts of fiction books contained in the books3 corpus, this enables the model to handle very long sequences

	The original code is available at the llm-foundry repository.
	Read more about it in the release blogpost
	Usage tips

	Learn more about some techniques behind training of the model in this section of llm-foundry repository
	If you want to use the advanced version of the model (triton kernels, direct flash attention integration), you can still use the original model implementation by adding trust_remote_code=True when calling from_pretrained.

	Resources

	Fine-tuning Notebook on how to fine-tune MPT-7B on a free Google Colab instance to turn the model into a Chatbot.

	MptConfig
	[[autodoc]] MptConfig
	- all
	MptModel
	[[autodoc]] MptModel
	- forward
	MptForCausalLM
	[[autodoc]] MptForCausalLM
	- forward
	MptForSequenceClassification
	[[autodoc]] MptForSequenceClassification
	- forward
	MptForTokenClassification
	[[autodoc]] MptForTokenClassification
	- forward
	MptForQuestionAnswering
	[[autodoc]] MptForQuestionAnswering
	- forward