Visual Question Answering is thus treated as a classification problem.