SQuAD

Span Detector, No Answer

  • SQuAD: Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.

    • v1.1

      • Train: 87599 / Dev: 10570 / Test: 9533

    • v2.0 + no_answer

      • Train : 130319 / Dev: 11873 / Test: 8862


Results (v1.1)

  • Dev Set

Model EM (official) F1 (official) BaseConfig Note
BiDAF 68.108 (67.7) 77.780 (77.3) squad/bidaf.json -
BiDAF + ELMo 74.295 82.727 squad/bidaf+elmo.json -
DrQA 68.316 (68.8) 77.493 (78.0) squad/drqa.json -
DocQA 71.760 (71.513) 80.635 (80.422) squad/docqa.json -
DocQA + ELMo 76.244 (77.5) 84.372 (84.5) squad/docqa+elmo.json -
QANet 70.918 (73.6) 79.800 (82.7) squad/qanet.json -
BERT-Base Uncased 79.508 (80.8) 87.642 (88.5) squad/bert_base_uncased.json -
BERT-Large Uncased 83.254 (84.1) 90.440 (90.9) squad/bert_large_uncased.json -
RoBERTa-Base 82.980 90.459 roberta_base.json/bert_base_uncased.json -
RoBERTa-Large 88.061 (88.9) 94.034 (94.6) squad/roberta_large.json -

Results (v2.0)

  • Dev Set

Model EM (official) F1 (official) BaseConfig Note
BiDAF 62.570 65.461 squad/bidaf_no_answer.json -
DocQA 61.728 64.489 squad/docqa_no_answer.json -