Submodules
-
claf.metric.classification.
f1
(pycm_obj)[source]
-
claf.metric.classification.
macro_f1
(pycm_obj)[source]
-
claf.metric.classification.
macro_precision
(pycm_obj)[source]
-
claf.metric.classification.
macro_recall
(pycm_obj)[source]
-
claf.metric.classification.
precision
(pycm_obj)[source]
-
claf.metric.classification.
recall
(pycm_obj)[source]
Official evaluation script for v1.1 of the SQuAD dataset.
-
claf.metric.squad_v1_official.
evaluate
(dataset, predictions)[source]
-
claf.metric.squad_v1_official.
exact_match_score
(prediction, ground_truth)[source]
-
claf.metric.squad_v1_official.
f1_score
(prediction, ground_truth)[source]
-
claf.metric.squad_v1_official.
metric_max_over_ground_truths
(metric_fn, prediction, ground_truths)[source]
-
claf.metric.squad_v1_official.
normalize_answer
(s)[source]
Lower text and remove punctuation, articles and extra whitespace.
Official evaluation script for SQuAD version 2.0.
In addition to basic functionality, we also compute additional statistics and
plot precision-recall curves if an additional na_prob.json file is provided.
This file is expected to map question ID’s to the model’s predicted probability
that a question is unanswerable.
-
claf.metric.squad_v2_official.
apply_no_ans_threshold
(scores, na_probs, qid_to_has_ans, na_prob_thresh)[source]
-
claf.metric.squad_v2_official.
compute_exact
(a_gold, a_pred)[source]
-
claf.metric.squad_v2_official.
compute_f1
(a_gold, a_pred)[source]
-
claf.metric.squad_v2_official.
evaluate
(dataset, na_probs, preds, na_prob_thresh=1.0)[source]
-
claf.metric.squad_v2_official.
find_all_best_thresh
(main_eval, preds, exact_raw, f1_raw, na_probs, qid_to_has_ans)[source]
-
claf.metric.squad_v2_official.
find_best_thresh
(preds, scores, na_probs, qid_to_has_ans)[source]
-
claf.metric.squad_v2_official.
get_raw_scores
(dataset, preds)[source]
-
claf.metric.squad_v2_official.
get_tokens
(s)[source]
-
claf.metric.squad_v2_official.
histogram_na_prob
(na_probs, qid_list, image_dir, name)[source]
-
claf.metric.squad_v2_official.
main
()[source]
-
claf.metric.squad_v2_official.
make_eval_dict
(exact_scores, f1_scores, qid_list=None)[source]
-
claf.metric.squad_v2_official.
make_precision_recall_eval
(scores, na_probs, num_true_pos, qid_to_has_ans, out_image=None, title=None)[source]
-
claf.metric.squad_v2_official.
make_qid_to_has_ans
(dataset)[source]
-
claf.metric.squad_v2_official.
merge_eval
(main_eval, new_eval, prefix)[source]
-
claf.metric.squad_v2_official.
normalize_answer
(s)[source]
Lower text and remove punctuation, articles and extra whitespace.
-
claf.metric.squad_v2_official.
parse_args
()[source]
-
claf.metric.squad_v2_official.
plot_pr_curve
(precisions, recalls, out_image, title)[source]
-
claf.metric.squad_v2_official.
run_precision_recall_analysis
(main_eval, exact_raw, f1_raw, na_probs, qid_to_has_ans, out_image_dir)[source]
Official evaluation script for WikiSQL dataset.
-
claf.metric.wikisql_official.
count_lines
(fname)[source]
-
claf.metric.wikisql_official.
evaluate
(labels, predictions, db_path, ordered=True)[source]
labels and predictions: dictionary {data_uid: sql_data, …}