claf.data.reader.bert package

Submodules

class claf.data.reader.bert.conll2003.CoNLL2003BertReader(file_paths, tokenizers, sequence_max_length=None, cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]

Bases: claf.data.reader.bert.tok_cls.TokClsBertReader

CoNLL2003 for BERT

  • Args:

    file_paths: file paths (train and dev)

  • Kwargs:

    ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored

class claf.data.reader.bert.seq_cls.SeqClsBertReader(file_paths, tokenizers, sequence_max_length=None, class_key='class', cls_token='[CLS]', sep_token='[SEP]', input_type='bert', is_test=False)[source]

Bases: claf.data.reader.base.DataReader

DataReader for Sequence (Single and Pair) Classification using BERT

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)

  • Kwargs:

    class_key: name of the label in .json file to use for classification

CLASS_DATA = None
METRIC_KEY = None
read_one_example(inputs)[source]

inputs keys: sequence_a and sequence_b

class claf.data.reader.bert.squad.SQuADBertReader(file_paths, lang_code, tokenizers, max_seq_length=384, context_stride=128, max_question_length=64, cls_token='[CLS]', sep_token='[SEP]')[source]

Bases: claf.data.reader.base.DataReader

SQuAD DataReader for BERT

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: defined tokenizers config (char/word)

METRIC_KEY = 'f1'
read_one_example(inputs)[source]

inputs keys: question, context

class claf.data.reader.bert.squad.Token(text, text_span=None)[source]

Bases: object

class claf.data.reader.bert.tok_cls.TokClsBertReader(file_paths, tokenizers, lang_code=None, sequence_max_length=None, tag_key='tags', cls_token='[CLS]', sep_token='[SEP]', ignore_tag_idx=-1)[source]

Bases: claf.data.reader.base.DataReader

DataReader for Token Classification using BERT

  • Args:

    file_paths: .json file paths (train and dev) tokenizers: define tokenizers config (subword)

  • Kwargs:

    lang_code: language code: set as ‘ko’ if using BERT model trained with mecab-tokenized data tag_key: name of the label in .json file to use for classification ignore_tag_idx: prediction results that have this number as ground-truth idx are ignored

read_one_example(inputs)[source]

inputs keys: sequence

Module contents