Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
When training a model from scratch, it is recommended to leave config.num_buckets=None, so that depending on the
sequence length a good value for num_buckets is calculated on the fly.