Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The mask head can be trained either jointly, or in a two steps process,
where one first trains a [~transformers.DetrForObjectDetection] model to detect bounding boxes around both
"things" (instances) and "stuff" (background things like trees, roads, sky), then freeze all the weights and train only
the mask head for 25 epochs.