Repetitive output

#1
by aviallon - opened

Hello everyone, I'm trying to make use of this model with a 16384 context-window.
However, after enough tokens are generated, the output starts becoming very repetitive.

I remarked the presence of phimoe.rope.scaling.original_context_length = 4096 in the model metadata, and I was wondering wether any inference parameters should be passed to llama.cpp for this model to produce valid outputs.

I guess you will have to adjust your expectations on performance with long contexts. Just because can increase the context window does not means the model knows what to do with it. If the original context length was 4k, you can probably go to 8k without too much quality degradation.

The model's metadata says it has a 128k context window, hence my incomprehension.

Hmm, are you talking about the context length or the generation length? Also,repetition happens with every llm.

The repetition with this model is very bad, making it unusable.
Actually, I pinpointed the reason: one of the model's stop tokens () is not considered as such by llama.cpp, which causes infinite generation when it occurs and the model continues outputting text.

The repetition with this model is very bad, making it unusable.
Actually, I pinpointed the reason: one of the model's stop tokens () is not considered as such by llama.cpp, which causes infinite generation when it occurs and the model continues outputting text.

llama.cpp not considering a stop token is a very common issue that I experianced with many models I tested. Luckely llama.cpp offers an option to fix this quite common issue:

stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: []

It even tells you why it stoped in the response:

stop_type: Indicating whether the completion has stopped. Possible values are:
- none: Generating (not stopped)
- eos: Stopped because it encountered the EOS token
- limit: Stopped because n_predict tokens were generated before stop words or EOS was encountered
- word: Stopped due to encountering a stopping word from stop JSON array provided

Sign up or log in to comment