[OwlViTProcessor] wraps [OwlViTImageProcessor] and [CLIPTokenizer] into a single instance to both encode the text and prepare the images.