Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc.