When working with approximate models, however, we typically have a constraint on the number of tokens the model can process.