Hidden state size and q projection shape

#3
by truong-aws - opened

Hello,
Im curious how the hidden_size can be 2880, but the q projection weight shape is (4096, 5760)?

image.png

I was expecting (4096, 2880).

truong-aws changed discussion status to closed

Sign up or log in to comment