The mask head can be trained either jointly, or in a two steps process, | |
where one first trains a [~transformers.DetrForObjectDetection] model to detect bounding boxes around both | |
"things" (instances) and "stuff" (background things like trees, roads, sky), then freeze all the weights and train only | |
the mask head for 25 epochs. |