Evaluation - Grand Challenge

For “yes/no" questions, they will be evaluated using accuracy.

For open-ended questions, they will be evaluated using exact match, macro-averaged F1, and BLEU.

The evaluation scripts are available at https://github.com/UCSD-AI4H/PathVQA

The submitted solutions will be ranked by the (macro) average of these metrics.