GLUE¶
GLUE
: The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems.
Results¶
Dev Set¶
Base Size : 12-layer, 768-hidden, 12-heads, 110M parameters
Task (Metric) | Model | CLaF Result | Official Result | BaseConfig |
---|---|---|---|---|
CoLA (Matthew's Corr) | BERT-Base | 59.393 | 52.1 (Test set) | glue/cola_bert.json |
MT-DNN (BERT) Base | 54.658 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 64.828 | 63.6 | glue/cola_roberta.json | |
MNLI m/mm (Accuracy) | BERT-Base | 83.923/84.306 | 84.6/83.4 (Test set) | glue/mnli{m/mm}_bert.json |
MT-DNN (BERT) Base | 84.452/84.225 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 87.305/87.236 | 87.6/- | glue/mnli{m/mm}_roberta.json | |
MRPC (Accuracy/F1) | BERT-Base | 87.5/91.282 | 88.9 (Test set) | glue/mrpc_bert.json |
MT-DNN (BERT) Base | 87.5/91.005 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 88.480/91.681 | 90.2 | glue/mrpc_roberta.json | |
QNLI (Accuracy) | BERT-Base | 88.521 | 90.5 (Test set) | glue/qnli_bert.json |
MT-DNN (BERT) Base | - | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 90.823 | 92.8 | glue/qnli_roberta.json | |
QQP (Accuracy/F1) | BERT-Base | 90.378/87.171 | 71.2 (Test set) | glue/qqp_bert.json |
MT-DNN (BERT) Base | 91.261/88.219 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 91.541/88.768 | 91.9 | glue/qqp_roberta.json | |
RTE (Accuracy) | BERT-Base | 69.314 | 66.4 (Test set) | glue/rte_bert.json |
MT-DNN (BERT) Base | 79.422 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 73.646 | 78.7 | glue/rte_roberta.json | |
SST-2 (Accuracy) | BERT-Base | 92.546 | 93.5 (Test set) | glue/sst_bert.json |
MT-DNN (BERT) Base | 93.005 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 94.495 | 94.8 | glue/sst_roberta.json | |
STS-B (Pearson/Spearman) | BERT-Base | 88.070/87.881 | 85.8 (Test set) | glue/stsb_bert.json |
MT-DNN (BERT) Base | 88.444/88.807 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 89.003/89.094 | 91.2 | glue/stsb_roberta.json | |
WNLI (Accuracy) | BERT-Base | 56.338 | 65.1 (Test set) | glue/wnli_bert.json |
MT-DNN (BERT) Base | 57.746 | - | 1. multi_task/bert_base_glue.json 2. fine-fune |
|
RoBERTa-Base | 60.563 | - | glue/wnli_roberta.json |
Large Size : 24-layer, 1024-hidden, 16-heads, 340M parameters
Task (Metric) | Model | CLaF Result | Official Result | BaseConfig |
---|---|---|---|---|
CoLA (Matthew's Corr) | BERT-Large | 61.151 | 60.6 | glue/cola_bert.json |
MT-DNN (BERT) Large | - | 63.5 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | - | 68.0 | glue/cola_roberta.json | |
MNLI m/mm (Accuracy) | BERT-Large | - | 86.6/- | glue/mnli{m/mm}_bert.json |
MT-DNN (BERT) Large | - | 87.1/86.7 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | - | 90.2/90.2 | glue/mnli{m/mm}_roberta.json | |
MRPC (Accuracy/F1) | BERT-Large | 87.255/90.845 | 88.0 | glue/mrpc_bert.json |
MT-DNN (BERT) Large | - | 91.0/87.5 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | 90.686/93.214 | 90.9 | glue/mrpc_roberta.json | |
QNLI (Accuracy) | BERT-Large | 90.440 | 92.3 | glue/qnli_bert.json |
MT-DNN (BERT) Large | - | 87.1/86.7 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | - | 94.7 | glue/qnli_roberta.json | |
QQP (Accuracy/F1) | BERT-Large | 91.640/88.745 | 91.3 | glue/qqp_bert.json |
MT-DNN (BERT) Large | - | 87.1/86.7 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | 91.848/89.031 | 92.2 | glue/qqp_roberta.json | |
RTE (Accuracy) | BERT-Large | 69.675 | 70.4 | glue/rte_bert.json |
MT-DNN (BERT) Large | - | 83.4 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | 84.838 | 86.6 | glue/rte_roberta.json | |
SST-2 (Accuracy) | BERT-Large | 93.349 | 93.2 | glue/sst_bert.json |
MT-DNN (BERT) Large | - | 94.3 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | 95.642 | 96.4 | glue/sst_roberta.json | |
STS-B (Pearson/Spearman) | BERT-Large | 90.041/89735 | 90.0 | glue/stsb_bert.json |
MT-DNN (BERT) Large | - | 90.7/90.6 | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | 91.980/91.764 | 92.4 | glue/stsb_roberta.json | |
WNLI (Accuracy) | BERT-Large | 59.155 | - | glue/wnli_bert.json |
MT-DNN (BERT) Large | - | - | 1. multi_task/bert_large_glue.json 2. fine-fune |
|
RoBERTa-Large | - | 91.3 | - |