(BLIP) | |
Image question answering: given an image, answer a question on this image (VILT) | |
Image segmentation: given an image and a prompt, output the segmentation mask of that prompt (CLIPSeg) | |
Speech to text: given an audio recording of a person talking, transcribe the speech into text (Whisper) | |
Text to speech: convert text to speech (SpeechT5) | |
Zero-shot text classification: given a text and a list of labels, identify to which label the text corresponds the most (BART) | |
Text summarization: summarize a long text in one or a few sentences (BART) | |
Translation: translate the text into a given language (NLLB) | |
These tools have an integration in transformers, and can be used manually as well, for example: | |
from transformers import load_tool | |
tool = load_tool("text-to-speech") | |
audio = tool("This is a text to speech tool") | |
Custom tools | |
While we identify a curated set of tools, we strongly believe that the main value provided by this implementation is | |
the ability to quickly create and share custom tools. |