|
|
|
Splinter |
|
Overview |
|
The Splinter model was proposed in Few-Shot Question Answering by Pretraining Span Selection by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. Splinter |
|
is an encoder-only transformer (similar to BERT) pretrained using the recurring span selection task on a large corpus |
|
comprising Wikipedia and the Toronto Book Corpus. |
|
The abstract from the paper is the following: |
|
In several question answering benchmarks, pretrained models have reached human parity through fine-tuning on an order |
|
of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred |
|
training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between |
|
current pretraining objectives and question answering. We propose a new pretraining scheme tailored for question |
|
answering: recurring span selection. Given a passage with multiple sets of recurring spans, we mask in each set all |
|
recurring spans but one, and ask the model to select the correct span in the passage for each masked span. Masked spans |
|
are replaced with a special token, viewed as a question representation, that is later used during fine-tuning to select |
|
the answer span. The resulting model obtains surprisingly good results on multiple benchmarks (e.g., 72.7 F1 on SQuAD |
|
with only 128 training examples), while maintaining competitive performance in the high-resource setting. |
|
This model was contributed by yuvalkirstain and oriram. The original code can be found here. |
|
Usage tips |
|
|
|
Splinter was trained to predict answers spans conditioned on a special [QUESTION] token. These tokens contextualize |
|
to question representations which are used to predict the answers. This layer is called QASS, and is the default |
|
behaviour in the [SplinterForQuestionAnswering] class. Therefore: |
|
Use [SplinterTokenizer] (rather than [BertTokenizer]), as it already |
|
contains this special token. Also, its default behavior is to use this token when two sequences are given (for |
|
example, in the run_qa.py script). |
|
If you plan on using Splinter outside run_qa.py, please keep in mind the question token - it might be important for |
|
the success of your model, especially in a few-shot setting. |
|
Please note there are two different checkpoints for each size of Splinter. Both are basically the same, except that |
|
one also has the pretrained weights of the QASS layer (tau/splinter-base-qass and tau/splinter-large-qass) and one |
|
doesn't (tau/splinter-base and tau/splinter-large). This is done to support randomly initializing this layer at |
|
fine-tuning, as it is shown to yield better results for some cases in the paper. |
|
|
|
Resources |
|
|
|
Question answering task guide |
|
|
|
SplinterConfig |
|
[[autodoc]] SplinterConfig |
|
SplinterTokenizer |
|
[[autodoc]] SplinterTokenizer |
|
- build_inputs_with_special_tokens |
|
- get_special_tokens_mask |
|
- create_token_type_ids_from_sequences |
|
- save_vocabulary |
|
SplinterTokenizerFast |
|
[[autodoc]] SplinterTokenizerFast |
|
SplinterModel |
|
[[autodoc]] SplinterModel |
|
- forward |
|
SplinterForQuestionAnswering |
|
[[autodoc]] SplinterForQuestionAnswering |
|
- forward |
|
SplinterForPreTraining |
|
[[autodoc]] SplinterForPreTraining |
|
- forward |