Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /_transformers_agents.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

12.4 kB


	Transformers Agents

	Transformers Agents is an experimental API which is subject to change at any time. Results returned by the agents
	can vary as the APIs or underlying models are prone to change.

	Transformers version v4.29.0, building on the concept of tools and agents. You can play with in
	this colab.
	In short, it provides a natural language API on top of transformers: we define a set of curated tools and design an
	agent to interpret natural language and to use these tools. It is extensible by design; we curated some relevant tools,
	but we'll show you how the system can be extended easily to use any tool developed by the community.
	Let's start with a few examples of what can be achieved with this new API. It is particularly powerful when it comes
	to multimodal tasks, so let's take it for a spin to generate images and read text out loud.
	py
	agent.run("Caption the following image", image=image)
	\| Input \| Output \|
	\|-----------------------------------------------------------------------------------------------------------------------------\|-----------------------------------\|
	\| \| A beaver is swimming in the water \|

	py
	agent.run("Read the following text out loud", text=text)
	\| Input \| Output \|
	\|-------------------------------------------------------------------------------------------------------------------------\|----------------------------------------------\|
	\| A beaver is swimming in the water \| your browser does not support the audio element.

	py
	agent.run(
	"In the following `document`, where will the TRRF Scientific Advisory Council Meeting take place?",
	document=document,
	)
	\| Input \| Output \|
	\|-----------------------------------------------------------------------------------------------------------------------------\|----------------\|
	\| \| ballroom foyer \|
	Quickstart
	Before being able to use agent.run, you will need to instantiate an agent, which is a large language model (LLM).
	We provide support for openAI models as well as opensource alternatives from BigCode and OpenAssistant. The openAI
	models perform better (but require you to have an openAI API key, so cannot be used for free); Hugging Face is
	providing free access to endpoints for BigCode and OpenAssistant models.
	To start with, please install the agents extras in order to install all default dependencies.

	pip install transformers[agents]
	To use openAI models, you instantiate an [OpenAiAgent] after installing the openai dependency:

	pip install openai

	from transformers import OpenAiAgent
	agent = OpenAiAgent(model="text-davinci-003", api_key="")

	To use BigCode or OpenAssistant, start by logging in to have access to the Inference API:

	from huggingface_hub import login
	login("")

	Then, instantiate the agent

	from transformers import HfAgent
	Starcoder
	agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
	StarcoderBase
	agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoderbase")
	OpenAssistant
	agent = HfAgent(url_endpoint="https://api-inference.huggingface.co/models/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")

	This is using the inference API that Hugging Face provides for free at the moment. If you have your own inference
	endpoint for this model (or another one) you can replace the URL above with your URL endpoint.

	StarCoder and OpenAssistant are free to use and perform admirably well on simple tasks. However, the checkpoints
	don't hold up when handling more complex prompts. If you're facing such an issue, we recommend trying out the OpenAI
	model which, while sadly not open-source, performs better at this given time.

	You're now good to go! Let's dive into the two APIs that you now have at your disposal.
	Single execution (run)
	The single execution method is when using the [~Agent.run] method of the agent:
	py
	agent.run("Draw me a picture of rivers and lakes.")

	It automatically selects the tool (or tools) appropriate for the task you want to perform and runs them appropriately. It
	can perform one or several tasks in the same instruction (though the more complex your instruction, the more likely
	the agent is to fail).
	py
	agent.run("Draw me a picture of the sea then transform the picture to add an island")

	Every [~Agent.run] operation is independent, so you can run it several times in a row with different tasks.
	Note that your agent is just a large-language model, so small variations in your prompt might yield completely
	different results. It's important to explain as clearly as possible the task you want to perform. We go more in-depth
	on how to write good prompts here.
	If you'd like to keep a state across executions or to pass non-text objects to the agent, you can do so by specifying
	variables that you would like the agent to use. For example, you could generate the first image of rivers and lakes,
	and ask the model to update that picture to add an island by doing the following:
	python
	picture = agent.run("Generate a picture of rivers and lakes.")
	updated_picture = agent.run("Transform the image in `picture` to add an island to it.", picture=picture)

	This can be helpful when the model is unable to understand your request and mixes tools. An example would be:
	py
	agent.run("Draw me the picture of a capybara swimming in the sea")
	Here, the model could interpret in two ways:
	- Have the text-to-image generate a capybara swimming in the sea
	- Or, have the text-to-image generate capybara, then use the image-transformation tool to have it swim in the sea
	In case you would like to force the first scenario, you could do so by passing it the prompt as an argument:
	py
	agent.run("Draw me a picture of the `prompt`", prompt="a capybara swimming in the sea")

	Chat-based execution (chat)
	The agent also has a chat-based approach, using the [~Agent.chat] method:
	py
	agent.chat("Generate a picture of rivers and lakes")

	py
	agent.chat("Transform the picture so that there is a rock in there")

	This is an interesting approach when you want to keep the state across instructions. It's better for experimentation,
	but will tend to be much better at single instructions rather than complex instructions (which the [~Agent.run]
	method is better at handling).
	This method can also take arguments if you would like to pass non-text types or specific prompts.
	⚠️ Remote execution
	For demonstration purposes and so that it could be used with all setups, we had created remote executors for several
	of the default tools the agent has access for the release. These are created using
	inference endpoints.
	We have turned these off for now, but in order to see how to set up remote executors tools yourself,
	we recommend reading the custom tool guide.
	What's happening here? What are tools, and what are agents?

	Agents
	The "agent" here is a large language model, and we're prompting it so that it has access to a specific set of tools.
	LLMs are pretty good at generating small samples of code, so this API takes advantage of that by prompting the
	LLM gives a small sample of code performing a task with a set of tools. This prompt is then completed by the
	task you give your agent and the description of the tools you give it. This way it gets access to the doc of the
	tools you are using, especially their expected inputs and outputs, and can generate the relevant code.
	Tools
	Tools are very simple: they're a single function, with a name, and a description. We then use these tools' descriptions
	to prompt the agent. Through the prompt, we show the agent how it would leverage tools to perform what was
	requested in the query.
	This is using brand-new tools and not pipelines, because the agent writes better code with very atomic tools.
	Pipelines are more refactored and often combine several tasks in one. Tools are meant to be focused on
	one very simple task only.
	Code-execution?!
	This code is then executed with our small Python interpreter on the set of inputs passed along with your tools.
	We hear you screaming "Arbitrary code execution!" in the back, but let us explain why that is not the case.
	The only functions that can be called are the tools you provided and the print function, so you're already
	limited in what can be executed. You should be safe if it's limited to Hugging Face tools.
	Then, we don't allow any attribute lookup or imports (which shouldn't be needed anyway for passing along
	inputs/outputs to a small set of functions) so all the most obvious attacks (and you'd need to prompt the LLM
	to output them anyway) shouldn't be an issue. If you want to be on the super safe side, you can execute the
	run() method with the additional argument return_code=True, in which case the agent will just return the code
	to execute and you can decide whether to do it or not.
	The execution will stop at any line trying to perform an illegal operation or if there is a regular Python error
	with the code generated by the agent.
	A curated set of tools
	We identify a set of tools that can empower such agents. Here is an updated list of the tools we have integrated
	in transformers:

	Document question answering: given a document (such as a PDF) in image format, answer a question on this document (Donut)
	Text question answering: given a long text and a question, answer the question in the text (Flan-T5)
	Unconditional image captioning: Caption the image! (BLIP)
	Image question answering: given an image, answer a question on this image (VILT)
	Image segmentation: given an image and a prompt, output the segmentation mask of that prompt (CLIPSeg)
	Speech to text: given an audio recording of a person talking, transcribe the speech into text (Whisper)
	Text to speech: convert text to speech (SpeechT5)
	Zero-shot text classification: given a text and a list of labels, identify to which label the text corresponds the most (BART)
	Text summarization: summarize a long text in one or a few sentences (BART)
	Translation: translate the text into a given language (NLLB)

	These tools have an integration in transformers, and can be used manually as well, for example:

	from transformers import load_tool
	tool = load_tool("text-to-speech")
	audio = tool("This is a text to speech tool")

	Custom tools
	While we identify a curated set of tools, we strongly believe that the main value provided by this implementation is
	the ability to quickly create and share custom tools.
	By pushing the code of a tool to a Hugging Face Space or a model repository, you're then able to leverage the tool
	directly with the agent. We've added a few
	transformers-agnostic tools to the huggingface-tools organization:

	Text downloader: to download a text from a web URL
	Text to image: generate an image according to a prompt, leveraging stable diffusion
	Image transformation: modify an image given an initial image and a prompt, leveraging instruct pix2pix stable diffusion
	Text to video: generate a small video according to a prompt, leveraging damo-vilab

	The text-to-image tool we have been using since the beginning is a remote tool that lives in
	huggingface-tools/text-to-image! We will
	continue releasing such tools on this and other organizations, to further supercharge this implementation.
	The agents have by default access to tools that reside on huggingface-tools.
	We explain how to you can write and share your tools as well as leverage any custom tool that resides on the Hub in following guide.
	Code generation
	So far we have shown how to use the agents to perform actions for you. However, the agent is only generating code
	that we then execute using a very restricted Python interpreter. In case you would like to use the code generated in
	a different setting, the agent can be prompted to return the code, along with tool definition and accurate imports.
	For example, the following instruction
	python
	agent.run("Draw me a picture of rivers and lakes", return_code=True)
	returns the following code
	thon
	from transformers import load_tool
	image_generator = load_tool("huggingface-tools/text-to-image")
	image = image_generator(prompt="rivers and lakes")

	that you can then modify and execute yourself.