claf.data.reader package¶
Subpackages¶
Submodules¶
-
class
claf.data.reader.base.DataReader(file_paths, dataset_obj)[source]¶ Bases:
objectDataReader Base Class
- Args:
file_paths: dictionary of consisting (‘train’ and ‘vaild’) file_path dataset_obj: Dataset Object (claf.data.dataset.base)
-
class
claf.data.reader.cola.CoLAReader(file_paths, tokenizers, sequence_max_length=None)[source]¶ Bases:
claf.data.reader.seq_cls.SeqClsReaderCoLA DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
-
CLASS_DATA= [0, 1]¶
-
class
claf.data.reader.seq_cls.SeqClsReader(file_paths, tokenizers, sequence_max_length=None, class_key='class')[source]¶ Bases:
claf.data.reader.base.DataReaderDataReader for Sequence Classification
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA= None¶
-
class
claf.data.reader.squad.SQuADReader(file_paths, lang_code, tokenizers, context_max_length=None)[source]¶ Bases:
claf.data.reader.base.DataReaderSQuAD DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
-
class
claf.data.reader.wikisql.WikiSQLReader(file_paths, tokenizers, context_max_length=None, is_test=None)[source]¶ Bases:
claf.data.reader.base.DataReaderWikiSQL DataReader (http://arxiv.org/abs/1709.00103)
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
Module contents¶
-
class
claf.data.reader.MultiTaskBertReader(file_paths, tokenizers, batch_sizes=[], readers=[])[source]¶ Bases:
claf.data.reader.base.DataReaderDataReader for Multi-Task using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA= None¶
-
class
claf.data.reader.RegressionBertReader(file_paths, tokenizers, sequence_max_length=None, label_key='score', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.base.DataReaderRegression DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
METRIC_KEY= None¶
-
class
claf.data.reader.STSBBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.regression.RegressionBertReaderSTS-B (Semantic Textual Similarity Benchmark) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
METRIC_KEY= 'pearson_spearman_corr'¶
-
class
claf.data.reader.SeqClsReader(file_paths, tokenizers, sequence_max_length=None, class_key='class')[source]¶ Bases:
claf.data.reader.base.DataReaderDataReader for Sequence Classification
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA= None¶
-
class
claf.data.reader.CoLAReader(file_paths, tokenizers, sequence_max_length=None)[source]¶ Bases:
claf.data.reader.seq_cls.SeqClsReaderCoLA DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
-
CLASS_DATA= [0, 1]¶
-
class
claf.data.reader.SeqClsBertReader(file_paths, tokenizers, sequence_max_length=None, class_key='class', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.base.DataReaderDataReader for Sequence (Single and Pair) Classification using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA= None¶
-
METRIC_KEY= None¶
-
class
claf.data.reader.CoLABertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderCoLA DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= [0, 1]¶
-
METRIC_KEY= 'matthews_corr'¶
-
class
claf.data.reader.MRPCBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderMRPC DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= [0, 1]¶
-
METRIC_KEY= 'f1'¶
-
class
claf.data.reader.MNLIBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderMNLI DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= ['contradiction', 'entailment', 'neutral']¶
-
METRIC_KEY= 'accuracy'¶
-
class
claf.data.reader.QNLIBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderQNLI DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= ['entailment', 'not_entailment']¶
-
METRIC_KEY= 'accuracy'¶
-
class
claf.data.reader.QQPBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderQuora Question Pairs DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= [0, 1]¶
-
METRIC_KEY= 'f1'¶
-
class
claf.data.reader.RTEBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderRTE (Recognizing Textual Entailment) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= ['entailment', 'not_entailment']¶
-
METRIC_KEY= 'accuracy'¶
-
class
claf.data.reader.SSTBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderSST DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= [0, 1]¶
-
METRIC_KEY= 'accuracy'¶
-
class
claf.data.reader.STSBBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source] Bases:
claf.data.reader.bert.regression.RegressionBertReaderSTS-B (Semantic Textual Similarity Benchmark) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
METRIC_KEY= 'pearson_spearman_corr'
-
class
claf.data.reader.WNLIBertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReaderWNLI (Winograd NLI) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA= [0, 1]¶
-
METRIC_KEY= 'accuracy'¶
-
class
claf.data.reader.SQuADReader(file_paths, lang_code, tokenizers, context_max_length=None)[source]¶ Bases:
claf.data.reader.base.DataReaderSQuAD DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
-
class
claf.data.reader.SQuADBertReader(file_paths, lang_code, tokenizers, max_seq_length=384, context_stride=128, max_question_length=64, cls_token='[CLS]', sep_token='[SEP]')[source]¶ Bases:
claf.data.reader.base.DataReaderSQuAD DataReader for BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
-
METRIC_KEY= 'f1'¶
-
class
claf.data.reader.TokClsBertReader(file_paths, tokenizers, lang_code=None, sequence_max_length=None, tag_key='tags', cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]¶ Bases:
claf.data.reader.base.DataReaderDataReader for Token Classification using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
lang_code: language code: set as ‘ko’ if using BERT model trained with mecab-tokenized data tag_key: name of the label in .json file to use for classification ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored
-
class
claf.data.reader.CoNLL2003BertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]¶ Bases:
claf.data.reader.bert.tok_cls.TokClsBertReaderCoNLL2003 for BERT
- Args:
file_paths: file paths (train and dev)
- Kwargs:
ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored
-
class
claf.data.reader.WikiSQLReader(file_paths, tokenizers, context_max_length=None, is_test=None)[source]¶ Bases:
claf.data.reader.base.DataReaderWikiSQL DataReader (http://arxiv.org/abs/1709.00103)
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)