claf.data.reader package

Submodules

class claf.data.reader.base.DataReader(file_paths, dataset_obj)[source]

Bases: object

DataReader Base Class

  • Args:

    file_paths: dictionary of consisting (‘train’ and ‘vaild’) file_path dataset_obj: Dataset Object (claf.data.dataset.base)

convert_to_dataset(datas, vocab, helpers=None)[source]

Batch to Dataset

filter_texts(dataset)[source]
read()[source]

read with Concrete DataReader each type

read_one_example(inputs)[source]
class claf.data.reader.cola.CoLAReader(file_paths, tokenizers, sequence_max_length=None)[source]

Bases: claf.data.reader.seq_cls.SeqClsReader

CoLA DataReader

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)

CLASS_DATA = [0, 1]
class claf.data.reader.seq_cls.SeqClsReader(file_paths, tokenizers, sequence_max_length=None, class_key='class')[source]

Bases: claf.data.reader.base.DataReader

DataReader for Sequence Classification

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)

  • Kwargs:

    class_key: name of the label in .json file to use for classification

CLASS_DATA = None
read_one_example(inputs)[source]

inputs keys: sequence

class claf.data.reader.squad.SQuADReader(file_paths, lang_code, tokenizers, context_max_length=None)[source]

Bases: claf.data.reader.base.DataReader

SQuAD DataReader

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)

read_one_example(inputs)[source]

inputs keys: question, context

class claf.data.reader.wikisql.WikiSQLReader(file_paths, tokenizers, context_max_length=None, is_test=None)[source]

Bases: claf.data.reader.base.DataReader

WikiSQL DataReader (http://arxiv.org/abs/1709.00103)

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)

get_coditions_value_position(question, values)[source]
load_data(sql_path, table_path, data_type=None)[source]
read_one_example(inputs)[source]

inputs keys: question, column, db_path, table_id

Module contents

class claf.data.reader.MultiTaskBertReader(file_paths, tokenizers, batch_sizes=[], readers=[])[source]

Bases: claf.data.reader.base.DataReader

DataReader for Multi-Task using BERT

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)

  • Kwargs:

    class_key: name of the label in .json file to use for classification

CLASS_DATA = None
make_data_reader(config_dict)[source]
make_task_by_reader(name, data_reader, helper)[source]
read_one_example(inputs)[source]
class claf.data.reader.RegressionBertReader(file_paths, tokenizers, sequence_max_length=None, label_key='score', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.base.DataReader

Regression DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

METRIC_KEY = None
read_one_example(inputs)[source]

inputs keys: sequence_a and sequence_b

class claf.data.reader.STSBBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.regression.RegressionBertReader

STS-B (Semantic Textual Similarity Benchmark) DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

METRIC_KEY = 'pearson_spearman_corr'
class claf.data.reader.SeqClsReader(file_paths, tokenizers, sequence_max_length=None, class_key='class')[source]

Bases: claf.data.reader.base.DataReader

DataReader for Sequence Classification

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)

  • Kwargs:

    class_key: name of the label in .json file to use for classification

CLASS_DATA = None
read_one_example(inputs)[source]

inputs keys: sequence

class claf.data.reader.CoLAReader(file_paths, tokenizers, sequence_max_length=None)[source]

Bases: claf.data.reader.seq_cls.SeqClsReader

CoLA DataReader

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)

CLASS_DATA = [0, 1]
class claf.data.reader.SeqClsBertReader(file_paths, tokenizers, sequence_max_length=None, class_key='class', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.base.DataReader

DataReader for Sequence (Single and Pair) Classification using BERT

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)

  • Kwargs:

    class_key: name of the label in .json file to use for classification

CLASS_DATA = None
METRIC_KEY = None
read_one_example(inputs)[source]

inputs keys: sequence_a and sequence_b

class claf.data.reader.CoLABertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

CoLA DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = [0, 1]
METRIC_KEY = 'matthews_corr'
class claf.data.reader.MRPCBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

MRPC DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = [0, 1]
METRIC_KEY = 'f1'
class claf.data.reader.MNLIBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

MNLI DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = ['contradiction', 'entailment', 'neutral']
METRIC_KEY = 'accuracy'
class claf.data.reader.QNLIBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

QNLI DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = ['entailment', 'not_entailment']
METRIC_KEY = 'accuracy'
class claf.data.reader.QQPBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

Quora Question Pairs DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = [0, 1]
METRIC_KEY = 'f1'
class claf.data.reader.RTEBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

RTE (Recognizing Textual Entailment) DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = ['entailment', 'not_entailment']
METRIC_KEY = 'accuracy'
class claf.data.reader.SSTBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

SST DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = [0, 1]
METRIC_KEY = 'accuracy'
class claf.data.reader.STSBBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.regression.RegressionBertReader

STS-B (Semantic Textual Similarity Benchmark) DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

METRIC_KEY = 'pearson_spearman_corr'
class claf.data.reader.WNLIBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.bert.seq_cls.SeqClsBertReader

WNLI (Winograd NLI) DataReader for BERT

  • Args:

    file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config

CLASS_DATA = [0, 1]
METRIC_KEY = 'accuracy'
class claf.data.reader.SQuADReader(file_paths, lang_code, tokenizers, context_max_length=None)[source]

Bases: claf.data.reader.base.DataReader

SQuAD DataReader

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)

read_one_example(inputs)[source]

inputs keys: question, context

class claf.data.reader.SQuADBertReader(file_paths, lang_code, tokenizers, max_seq_length=384, context_stride=128, max_question_length=64, cls_token='[CLS]', sep_token='[SEP]')[source]

Bases: claf.data.reader.base.DataReader

SQuAD DataReader for BERT

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)

METRIC_KEY = 'f1'
read_one_example(inputs)[source]

inputs keys: question, context

class claf.data.reader.TokClsBertReader(file_paths, tokenizers, lang_code=None, sequence_max_length=None, tag_key='tags', cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]

Bases: claf.data.reader.base.DataReader

DataReader for Token Classification using BERT

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)

  • Kwargs:

    lang_code: language code: set as ‘ko’ if using BERT model trained with mecab-tokenized data tag_key: name of the label in .json file to use for classification ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored

read_one_example(inputs)[source]

inputs keys: sequence

class claf.data.reader.CoNLL2003BertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]

Bases: claf.data.reader.bert.tok_cls.TokClsBertReader

CoNLL2003 for BERT

  • Args:

    file_paths: file paths (train and dev)

  • Kwargs:

    ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored

class claf.data.reader.WikiSQLReader(file_paths, tokenizers, context_max_length=None, is_test=None)[source]

Bases: claf.data.reader.base.DataReader

WikiSQL DataReader (http://arxiv.org/abs/1709.00103)

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)

get_coditions_value_position(question, values)[source]
load_data(sql_path, table_path, data_type=None)[source]
read_one_example(inputs)[source]

inputs keys: question, column, db_path, table_id