You can now use a second preprocessing function to | |
concatenate all the sequences | |
split the concatenated sequences into shorter chunks defined by block_size, which should be both shorter than the maximum input length and short enough for your GPU RAM. |