You can now use a second preprocessing function to | |
- concatenate all the sequences | |
- split the concatenated sequences into shorter chunks defined by block_size, which should be both shorter than the maximum input length and short enough for your GPU RAM. |