FDeRubeis
/

araft_trained_sft

FDeRubeis commited on May 17, 2024

Commit

833e3cb

verified ·

1 Parent(s): 1e9a02e

Update Readme.md: add links to ReAct and HotpotQA papers

Files changed (1) hide show

README.md CHANGED Viewed

@@ -17,9 +17,9 @@ This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://hu
 ## Model description
-This model has been generated in the context of the [Araft](https://github.com/FDeRubeis/Araft) project. The Araft project consists in fine-tuning a Llama2-7B model to enable the use of the ReAct pattern for Wikipedia-augmented question-answering. This model is the product of the first training step: SFT training.
-In the SFT training step, the trajectories from the [Araft dataset](https://huggingface.co/datasets/FDeRubeis/araft) have been used to fine-tune the model, using each step as a desired output for the previous part of the trajectory. The model achieves a 16% performace (f1 score) on the HotpotQA dataset.
 For further information, please see the [Araft](https://github.com/FDeRubeis/Araft) github repo.

 ## Model description
+This model has been generated in the context of the [Araft](https://github.com/FDeRubeis/Araft) project. The Araft project consists in fine-tuning a Llama2-7B model to enable the use of the [ReAct](https://arxiv.org/abs/2210.03629) pattern for Wikipedia-augmented question-answering. This model is the product of the first training step: SFT training.
+In the SFT training step, the trajectories from the [Araft dataset](https://huggingface.co/datasets/FDeRubeis/araft) have been used to fine-tune the model, using each step as a desired output for the previous part of the trajectory. The model achieves a 16% performace (f1 score) on the [HotpotQA dataset](https://hotpotqa.github.io/).
 For further information, please see the [Araft](https://github.com/FDeRubeis/Araft) github repo.