Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
This means that the input to the backbone is a
tensor of shape (batch_size, 3, height, width), assuming the image has 3 color channels (RGB).