claf.data.reader package¶
Subpackages¶
Submodules¶
-
class
claf.data.reader.base.
DataReader
(file_paths, dataset_obj)[source]¶ Bases:
object
DataReader Base Class
- Args:
file_paths: dictionary of consisting (‘train’ and ‘vaild’) file_path dataset_obj: Dataset Object (claf.data.dataset.base)
-
class
claf.data.reader.cola.
CoLAReader
(file_paths, tokenizers, sequence_max_length=None)[source]¶ Bases:
claf.data.reader.seq_cls.SeqClsReader
CoLA DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
-
CLASS_DATA
= [0, 1]¶
-
class
claf.data.reader.seq_cls.
SeqClsReader
(file_paths, tokenizers, sequence_max_length=None, class_key='class')[source]¶ Bases:
claf.data.reader.base.DataReader
DataReader for Sequence Classification
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA
= None¶
-
class
claf.data.reader.squad.
SQuADReader
(file_paths, lang_code, tokenizers, context_max_length=None)[source]¶ Bases:
claf.data.reader.base.DataReader
SQuAD DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
-
class
claf.data.reader.wikisql.
WikiSQLReader
(file_paths, tokenizers, context_max_length=None, is_test=None)[source]¶ Bases:
claf.data.reader.base.DataReader
WikiSQL DataReader (http://arxiv.org/abs/1709.00103)
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
Module contents¶
-
class
claf.data.reader.
MultiTaskBertReader
(file_paths, tokenizers, batch_sizes=[], readers=[])[source]¶ Bases:
claf.data.reader.base.DataReader
DataReader for Multi-Task using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA
= None¶
-
class
claf.data.reader.
RegressionBertReader
(file_paths, tokenizers, sequence_max_length=None, label_key='score', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.base.DataReader
Regression DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
METRIC_KEY
= None¶
-
class
claf.data.reader.
STSBBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.regression.RegressionBertReader
STS-B (Semantic Textual Similarity Benchmark) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
METRIC_KEY
= 'pearson_spearman_corr'¶
-
class
claf.data.reader.
SeqClsReader
(file_paths, tokenizers, sequence_max_length=None, class_key='class')[source]¶ Bases:
claf.data.reader.base.DataReader
DataReader for Sequence Classification
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA
= None¶
-
class
claf.data.reader.
CoLAReader
(file_paths, tokenizers, sequence_max_length=None)[source]¶ Bases:
claf.data.reader.seq_cls.SeqClsReader
CoLA DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (word)
-
CLASS_DATA
= [0, 1]¶
-
class
claf.data.reader.
SeqClsBertReader
(file_paths, tokenizers, sequence_max_length=None, class_key='class', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.base.DataReader
DataReader for Sequence (Single and Pair) Classification using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA
= None¶
-
METRIC_KEY
= None¶
-
class
claf.data.reader.
CoLABertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
CoLA DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= [0, 1]¶
-
METRIC_KEY
= 'matthews_corr'¶
-
class
claf.data.reader.
MRPCBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
MRPC DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= [0, 1]¶
-
METRIC_KEY
= 'f1'¶
-
class
claf.data.reader.
MNLIBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
MNLI DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= ['contradiction', 'entailment', 'neutral']¶
-
METRIC_KEY
= 'accuracy'¶
-
class
claf.data.reader.
QNLIBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
QNLI DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= ['entailment', 'not_entailment']¶
-
METRIC_KEY
= 'accuracy'¶
-
class
claf.data.reader.
QQPBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
Quora Question Pairs DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= [0, 1]¶
-
METRIC_KEY
= 'f1'¶
-
class
claf.data.reader.
RTEBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
RTE (Recognizing Textual Entailment) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= ['entailment', 'not_entailment']¶
-
METRIC_KEY
= 'accuracy'¶
-
class
claf.data.reader.
SSTBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
SST DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= [0, 1]¶
-
METRIC_KEY
= 'accuracy'¶
-
class
claf.data.reader.
STSBBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source] Bases:
claf.data.reader.bert.regression.RegressionBertReader
STS-B (Semantic Textual Similarity Benchmark) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
METRIC_KEY
= 'pearson_spearman_corr'
-
class
claf.data.reader.
WNLIBertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.bert.seq_cls.SeqClsBertReader
WNLI (Winograd NLI) DataReader for BERT
- Args:
file_paths: .tsv file paths (train and dev) tokenizers: defined tokenizers config
-
CLASS_DATA
= [0, 1]¶
-
METRIC_KEY
= 'accuracy'¶
-
class
claf.data.reader.
SQuADReader
(file_paths, lang_code, tokenizers, context_max_length=None)[source]¶ Bases:
claf.data.reader.base.DataReader
SQuAD DataReader
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
-
class
claf.data.reader.
SQuADBertReader
(file_paths, lang_code, tokenizers, max_seq_length=384, context_stride=128, max_question_length=64, cls_token='[CLS]', sep_token='[SEP]')[source]¶ Bases:
claf.data.reader.base.DataReader
SQuAD DataReader for BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
-
METRIC_KEY
= 'f1'¶
-
class
claf.data.reader.
TokClsBertReader
(file_paths, tokenizers, lang_code=None, sequence_max_length=None, tag_key='tags', cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]¶ Bases:
claf.data.reader.base.DataReader
DataReader for Token Classification using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
lang_code: language code: set as ‘ko’ if using BERT model trained with mecab-tokenized data tag_key: name of the label in .json file to use for classification ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored
-
class
claf.data.reader.
CoNLL2003BertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]¶ Bases:
claf.data.reader.bert.tok_cls.TokClsBertReader
CoNLL2003 for BERT
- Args:
file_paths: file paths (train and dev)
- Kwargs:
ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored
-
class
claf.data.reader.
WikiSQLReader
(file_paths, tokenizers, context_max_length=None, is_test=None)[source]¶ Bases:
claf.data.reader.base.DataReader
WikiSQL DataReader (http://arxiv.org/abs/1709.00103)
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)