claf.metric package

Submodules

claf.metric.classification.f1(pycm_obj)[source]
claf.metric.classification.macro_f1(pycm_obj)[source]
claf.metric.classification.macro_precision(pycm_obj)[source]
claf.metric.classification.macro_recall(pycm_obj)[source]
claf.metric.classification.precision(pycm_obj)[source]
claf.metric.classification.recall(pycm_obj)[source]

Official evaluation script for v1.1 of the SQuAD dataset.

claf.metric.squad_v1_official.evaluate(dataset, predictions)[source]
claf.metric.squad_v1_official.exact_match_score(prediction, ground_truth)[source]
claf.metric.squad_v1_official.f1_score(prediction, ground_truth)[source]
claf.metric.squad_v1_official.metric_max_over_ground_truths(metric_fn, prediction, ground_truths)[source]
claf.metric.squad_v1_official.normalize_answer(s)[source]

Lower text and remove punctuation, articles and extra whitespace.

Official evaluation script for SQuAD version 2.0.

In addition to basic functionality, we also compute additional statistics and plot precision-recall curves if an additional na_prob.json file is provided. This file is expected to map question ID’s to the model’s predicted probability that a question is unanswerable.

claf.metric.squad_v2_official.apply_no_ans_threshold(scores, na_probs, qid_to_has_ans, na_prob_thresh)[source]
claf.metric.squad_v2_official.compute_exact(a_gold, a_pred)[source]
claf.metric.squad_v2_official.compute_f1(a_gold, a_pred)[source]
claf.metric.squad_v2_official.evaluate(dataset, na_probs, preds, na_prob_thresh=1.0)[source]
claf.metric.squad_v2_official.find_all_best_thresh(main_eval, preds, exact_raw, f1_raw, na_probs, qid_to_has_ans)[source]
claf.metric.squad_v2_official.find_best_thresh(preds, scores, na_probs, qid_to_has_ans)[source]
claf.metric.squad_v2_official.get_raw_scores(dataset, preds)[source]
claf.metric.squad_v2_official.get_tokens(s)[source]
claf.metric.squad_v2_official.histogram_na_prob(na_probs, qid_list, image_dir, name)[source]
claf.metric.squad_v2_official.main()[source]
claf.metric.squad_v2_official.make_eval_dict(exact_scores, f1_scores, qid_list=None)[source]
claf.metric.squad_v2_official.make_precision_recall_eval(scores, na_probs, num_true_pos, qid_to_has_ans, out_image=None, title=None)[source]
claf.metric.squad_v2_official.make_qid_to_has_ans(dataset)[source]
claf.metric.squad_v2_official.merge_eval(main_eval, new_eval, prefix)[source]
claf.metric.squad_v2_official.normalize_answer(s)[source]

Lower text and remove punctuation, articles and extra whitespace.

claf.metric.squad_v2_official.parse_args()[source]
claf.metric.squad_v2_official.plot_pr_curve(precisions, recalls, out_image, title)[source]
claf.metric.squad_v2_official.run_precision_recall_analysis(main_eval, exact_raw, f1_raw, na_probs, qid_to_has_ans, out_image_dir)[source]

Official evaluation script for WikiSQL dataset.

claf.metric.wikisql_official.count_lines(fname)[source]
claf.metric.wikisql_official.evaluate(labels, predictions, db_path, ordered=True)[source]

labels and predictions: dictionary {data_uid: sql_data, …}

Module contents