claf.data.reader.bert package¶
Submodules¶
-
class
claf.data.reader.bert.conll2003.
CoNLL2003BertReader
(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]¶ Bases:
claf.data.reader.bert.tok_cls.TokClsBertReader
CoNLL2003 for BERT
- Args:
file_paths: file paths (train and dev)
- Kwargs:
ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored
-
class
claf.data.reader.bert.seq_cls.
SeqClsBertReader
(file_paths, tokenizers, sequence_max_length=None, class_key='class', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]¶ Bases:
claf.data.reader.base.DataReader
DataReader for Sequence (Single and Pair) Classification using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
class_key: name of the label in .json file to use for classification
-
CLASS_DATA
= None¶
-
METRIC_KEY
= None¶
-
class
claf.data.reader.bert.squad.
SQuADBertReader
(file_paths, lang_code, tokenizers, max_seq_length=384, context_stride=128, max_question_length=64, cls_token='[CLS]', sep_token='[SEP]')[source]¶ Bases:
claf.data.reader.base.DataReader
SQuAD DataReader for BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)
-
METRIC_KEY
= 'f1'¶
-
class
claf.data.reader.bert.tok_cls.
TokClsBertReader
(file_paths, tokenizers, lang_code=None, sequence_max_length=None, tag_key='tags', cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]¶ Bases:
claf.data.reader.base.DataReader
DataReader for Token Classification using BERT
- Args:
file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)
- Kwargs:
lang_code: language code: set as ‘ko’ if using BERT model trained with mecab-tokenized data tag_key: name of the label in .json file to use for classification ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored