Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
This is handled by objects called processors, which group together two or more processing objects
such as tokenizers (for the text modality), image processors (for vision) and feature extractors (for audio).