Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /model_doc_megatron-bert.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

4.62 kB


	MegatronBERT
	Overview
	The MegatronBERT model was proposed in Megatron-LM: Training Multi-Billion Parameter Language Models Using Model
	Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley,
	Jared Casper and Bryan Catanzaro.
	The abstract from the paper is the following:
	Recent work in language modeling demonstrates that training large transformer models advances the state of the art in
	Natural Language Processing applications. However, very large models can be quite difficult to train due to memory
	constraints. In this work, we present our techniques for training very large transformer models and implement a simple,
	efficient intra-layer model parallel approach that enables training transformer models with billions of parameters. Our
	approach does not require a new compiler or library changes, is orthogonal and complimentary to pipeline model
	parallelism, and can be fully implemented with the insertion of a few communication operations in native PyTorch. We
	illustrate this approach by converging transformer based models up to 8.3 billion parameters using 512 GPUs. We sustain
	15.1 PetaFLOPs across the entire application with 76% scaling efficiency when compared to a strong single GPU baseline
	that sustains 39 TeraFLOPs, which is 30% of peak FLOPs. To demonstrate that large language models can further advance
	the state of the art (SOTA), we train an 8.3 billion parameter transformer language model similar to GPT-2 and a 3.9
	billion parameter model similar to BERT. We show that careful attention to the placement of layer normalization in
	BERT-like models is critical to achieving increased performance as the model size grows. Using the GPT-2 model we
	achieve SOTA results on the WikiText103 (10.8 compared to SOTA perplexity of 15.8) and LAMBADA (66.5% compared to SOTA
	accuracy of 63.2%) datasets. Our BERT model achieves SOTA results on the RACE dataset (90.9% compared to SOTA accuracy
	of 89.4%).
	This model was contributed by jdemouth. The original code can be found here.
	That repository contains a multi-GPU and multi-node implementation of the Megatron Language models. In particular,
	it contains a hybrid model parallel approach using "tensor parallel" and "pipeline parallel" techniques.
	Usage tips
	We have provided pretrained BERT-345M checkpoints
	for use to evaluate or finetuning downstream tasks.
	To access these checkpoints, first sign up for and setup the NVIDIA GPU Cloud (NGC)
	Registry CLI. Further documentation for downloading models can be found in the NGC documentation.
	Alternatively, you can directly download the checkpoints using:
	BERT-345M-uncased:

	wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_uncased/zip
	-O megatron_bert_345m_v0_1_uncased.zip
	BERT-345M-cased:

	wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O
	megatron_bert_345m_v0_1_cased.zip
	Once you have obtained the checkpoints from NVIDIA GPU Cloud (NGC), you have to convert them to a format that will
	easily be loaded by Hugging Face Transformers and our port of the BERT code.
	The following commands allow you to do the conversion. We assume that the folder models/megatron_bert contains
	megatron_bert_345m_v0_1_{cased, uncased}.zip and that the commands are run from inside that folder:

	python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_uncased.zip

	python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
	Resources

	Text classification task guide
	Token classification task guide
	Question answering task guide
	Causal language modeling task guide
	Masked language modeling task guide
	Multiple choice task guide

	MegatronBertConfig
	[[autodoc]] MegatronBertConfig
	MegatronBertModel
	[[autodoc]] MegatronBertModel
	- forward
	MegatronBertForMaskedLM
	[[autodoc]] MegatronBertForMaskedLM
	- forward
	MegatronBertForCausalLM
	[[autodoc]] MegatronBertForCausalLM
	- forward
	MegatronBertForNextSentencePrediction
	[[autodoc]] MegatronBertForNextSentencePrediction
	- forward
	MegatronBertForPreTraining
	[[autodoc]] MegatronBertForPreTraining
	- forward
	MegatronBertForSequenceClassification
	[[autodoc]] MegatronBertForSequenceClassification
	- forward
	MegatronBertForMultipleChoice
	[[autodoc]] MegatronBertForMultipleChoice
	- forward
	MegatronBertForTokenClassification
	[[autodoc]] MegatronBertForTokenClassification
	- forward
	MegatronBertForQuestionAnswering
	[[autodoc]] MegatronBertForQuestionAnswering
	- forward