Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The benefit is that through the conditional spatial query, each cross-attention head is able to attend to a band containing a distinct region, e.g., one object extremity or a region inside the object box.