File size: 476 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
Just
  separate your segments with the separation token tokenizer.sep_token (or </s>)

Same as BERT with better pretraining tricks:

dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all
together to reach 512 tokens (so the sentences are in an order than may span several documents)
train with larger batches
use BPE with bytes as a subunit and not characters (because of unicode characters)
CamemBERT is a wrapper around RoBERTa.