not in fully bi-directional setting), use the perm_mask and | |
target_mapping inputs to control the attention span and outputs (see examples in | |
examples/pytorch/text-generation/run_generation.py) | |
XLNet is one of the few models that has no sequence length limit. |