This is handled by objects called processors, which group together two or more processing objects | |
such as tokenizers (for the text modality), image processors (for vision) and feature extractors (for audio). |
This is handled by objects called processors, which group together two or more processing objects | |
such as tokenizers (for the text modality), image processors (for vision) and feature extractors (for audio). |