5fa1a76
1
The mask predictions are generated by combining the pixel-embeddings with the final decoder hidden states.