Repetitive output

by aviallon - opened Apr 23

Apr 23

Hello everyone, I'm trying to make use of this model with a 16384 context-window.
However, after enough tokens are generated, the output starts becoming very repetitive.

I remarked the presence of phimoe.rope.scaling.original_context_length = 4096 in the model metadata, and I was wondering wether any inference parameters should be passed to llama.cpp for this model to produce valid outputs.

mradermacher

Owner Apr 23

I guess you will have to adjust your expectations on performance with long contexts. Just because can increase the context window does not means the model knows what to do with it. If the original context length was 4k, you can probably go to 8k without too much quality degradation.

aviallon

Apr 23

The model's metadata says it has a 128k context window, hence my incomprehension.

mradermacher

Owner Apr 23

Hmm, are you talking about the context length or the generation length? Also,repetition happens with every llm.

aviallon

May 5

The repetition with this model is very bad, making it unusable.
Actually, I pinpointed the reason: one of the model's stop tokens () is not considered as such by llama.cpp, which causes infinite generation when it occurs and the model continues outputting text.

nicoboss

May 5

•

edited May 5

The repetition with this model is very bad, making it unusable.
Actually, I pinpointed the reason: one of the model's stop tokens () is not considered as such by llama.cpp, which causes infinite generation when it occurs and the model continues outputting text.

llama.cpp not considering a stop token is a very common issue that I experianced with many models I tested. Luckely llama.cpp offers an option to fix this quite common issue:

stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: []

It even tells you why it stoped in the response:

stop_type: Indicating whether the completion has stopped. Possible values are:
- none: Generating (not stopped)
- eos: Stopped because it encountered the EOS token
- limit: Stopped because n_predict tokens were generated before stop words or EOS was encountered
- word: Stopped due to encountering a stopping word from stop JSON array provided

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment