Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
There are two components to the object detection head: a linear layer to transform the decoder hidden states into logits over the class labels, and a MLP to predict the bounding box.