OS-Copilot
/

OS-Genesis-8B-WA

@@ -1,12 +1,14 @@
 ---
-license: apache-2.0
-library_name: transformers
 base_model: OpenGVLab/InternVL2-4B
 pipeline_tag: image-text-to-text
 ---
 # OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
 <div align="center">
 [\[🏠Homepage\]](https://qiushisun.github.io/OS-Genesis-Home/) [\[💻Code\]](https://github.com/OS-Copilot/OS-Genesis) [\[📝Paper\]](https://arxiv.org/abs/2412.19723) [\[🤗Models\]](https://huggingface.co/collections/OS-Copilot/os-genesis-6768d4b6fffc431dbf624c2d)[\[🤗Data\]](https://huggingface.co/collections/OS-Copilot/os-genesis-6768d4b6fffc431dbf624c2d)
@@ -137,9 +139,15 @@ tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast
 pixel_values = load_image('./web_dfacd48d-d2c2-492f-b94c-41e6a34ea99f.png', max_num=6).to(torch.bfloat16).cuda()
 generation_config = dict(max_new_tokens=1024, do_sample=True)
-question = "<image>\nYou are a GUI task expert, I will provide you with a high-level instruction, an action history, a screenshot with its corresponding accessibility tree.\n High-level instruction: {high_level_instruction}\n Action history: {action_history}\n Accessibility tree: {a11y_tree}\n  Please generate the low-level thought and action for the next step."
 response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
-print(f'User: {question}\nAssistant: {response}')
 ```

 ---
 base_model: OpenGVLab/InternVL2-4B
+library_name: transformers
+license: apache-2.0
 pipeline_tag: image-text-to-text
 ---
 # OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
+This model is described in the paper [OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis](https://huggingface.co/papers/2412.19723)
 <div align="center">
 [\[🏠Homepage\]](https://qiushisun.github.io/OS-Genesis-Home/) [\[💻Code\]](https://github.com/OS-Copilot/OS-Genesis) [\[📝Paper\]](https://arxiv.org/abs/2412.19723) [\[🤗Models\]](https://huggingface.co/collections/OS-Copilot/os-genesis-6768d4b6fffc431dbf624c2d)[\[🤗Data\]](https://huggingface.co/collections/OS-Copilot/os-genesis-6768d4b6fffc431dbf624c2d)
 pixel_values = load_image('./web_dfacd48d-d2c2-492f-b94c-41e6a34ea99f.png', max_num=6).to(torch.bfloat16).cuda()
 generation_config = dict(max_new_tokens=1024, do_sample=True)
+question = "<image>
+You are a GUI task expert, I will provide you with a high-level instruction, an action history, a screenshot with its corresponding accessibility tree.
+ High-level instruction: {high_level_instruction}
+ Action history: {action_history}
+ Accessibility tree: {a11y_tree}
+  Please generate the low-level thought and action for the next step."
 response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
+print(f'User: {question}
+Assistant: {response}')
 ```