So we are shrinking the sequence length at the cost of a bigger embedding matrix.