Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
not in fully bi-directional setting), use the perm_mask and
target_mapping inputs to control the attention span and outputs (see examples in
examples/pytorch/text-generation/run_generation.py)
XLNet is one of the few models that has no sequence length limit.