Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
476 Bytes
Just
separate your segments with the separation token tokenizer.sep_token (or </s>)
Same as BERT with better pretraining tricks:
dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all
together to reach 512 tokens (so the sentences are in an order than may span several documents)
train with larger batches
use BPE with bytes as a subunit and not characters (because of unicode characters)
CamemBERT is a wrapper around RoBERTa.