claf.tokens.indexer package¶
Submodules¶
-
class
claf.tokens.indexer.base.
TokenIndexer
(tokenizer)[source]¶ Bases:
object
Token Indexer
indexing tokens (eg. ‘hi’ -> 4)
-
class
claf.tokens.indexer.bert_indexer.
BertIndexer
(tokenizer, do_tokenize=True)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Bert Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: SubwordTokenizer
- Kwargs:
lowercase: word token to lowercase insert_start: insert start_token to first insert_end: append end_token
-
class
claf.tokens.indexer.char_indexer.
CharIndexer
(tokenizer, insert_char_start=None, insert_char_end=None)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Character Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: CharTokenizer
- Kwargs:
- insert_char_start: insert start index (eg. [‘h’, ‘i’] -> [‘<s>’, ‘h’, ‘i’] )
default is None
- insert_char_end: insert end index (eg. [‘h’, ‘i’] -> [‘h’, ‘i’, ‘</s>’] )
default is None
This code is from allenai/allennlp (https://github.com/allenai/allennlp/blob/master/allennlp/data/token_indexers/elmo_indexer.py)
-
class
claf.tokens.indexer.elmo_indexer.
ELMoIndexer
(tokenizer)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Maps individual tokens to sequences of character ids, compatible with ELMo. To be consistent with previously trained models, we include it here as special of existing character indexers.
-
BOS_TOKEN
= '<S>'¶
-
EOS_TOKEN
= '</S>'¶
-
beginning_of_sentence_character
= 256¶
-
beginning_of_sentence_characters
= [258, 256, 259, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260]¶
-
beginning_of_word_character
= 258¶
-
end_of_sentence_character
= 257¶
-
end_of_sentence_characters
= [258, 257, 259, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260]¶
-
end_of_word_character
= 259¶
-
max_word_length
= 50¶
-
padding_character
= 260¶
-
-
class
claf.tokens.indexer.exact_match_indexer.
ExactMatchIndexer
(tokenizer, lower=True, lemma=True)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Exact Match Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: WordTokenizer
- Kwargs:
lower: add lower feature. default is True (0 or 1) lemma: add lemma case feature. feature is True (0 or 1)
-
class
claf.tokens.indexer.linguistic_indexer.
LinguisticIndexer
(tokenizer, pos_tag=None, ner=None, dep=None)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Linguistic Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: WordTokenizer
- Kwargs:
pos_tag: POS Tagging ner: Named Entity Recognition dep: Dependency Parser
-
class
claf.tokens.indexer.word_indexer.
WordIndexer
(tokenizer, do_tokenize=True, lowercase=False, insert_start=None, insert_end=None)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Word Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: WordTokenizer
- Kwargs:
lowercase: word token to lowercase insert_start: insert start_token to first insert_end: append end_token
Module contents¶
-
class
claf.tokens.indexer.
BertIndexer
(tokenizer, do_tokenize=True)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Bert Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: SubwordTokenizer
- Kwargs:
lowercase: word token to lowercase insert_start: insert start_token to first insert_end: append end_token
-
class
claf.tokens.indexer.
CharIndexer
(tokenizer, insert_char_start=None, insert_char_end=None)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Character Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: CharTokenizer
- Kwargs:
- insert_char_start: insert start index (eg. [‘h’, ‘i’] -> [‘<s>’, ‘h’, ‘i’] )
default is None
- insert_char_end: insert end index (eg. [‘h’, ‘i’] -> [‘h’, ‘i’, ‘</s>’] )
default is None
-
class
claf.tokens.indexer.
ELMoIndexer
(tokenizer)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Maps individual tokens to sequences of character ids, compatible with ELMo. To be consistent with previously trained models, we include it here as special of existing character indexers.
-
BOS_TOKEN
= '<S>'¶
-
EOS_TOKEN
= '</S>'¶
-
beginning_of_sentence_character
= 256¶
-
beginning_of_sentence_characters
= [258, 256, 259, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260]¶
-
beginning_of_word_character
= 258¶
-
end_of_sentence_character
= 257¶
-
end_of_sentence_characters
= [258, 257, 259, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260, 260]¶
-
end_of_word_character
= 259¶
-
max_word_length
= 50¶
-
padding_character
= 260¶
-
-
class
claf.tokens.indexer.
ExactMatchIndexer
(tokenizer, lower=True, lemma=True)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Exact Match Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: WordTokenizer
- Kwargs:
lower: add lower feature. default is True (0 or 1) lemma: add lemma case feature. feature is True (0 or 1)
-
class
claf.tokens.indexer.
LinguisticIndexer
(tokenizer, pos_tag=None, ner=None, dep=None)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Linguistic Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: WordTokenizer
- Kwargs:
pos_tag: POS Tagging ner: Named Entity Recognition dep: Dependency Parser
-
class
claf.tokens.indexer.
WordIndexer
(tokenizer, do_tokenize=True, lowercase=False, insert_start=None, insert_end=None)[source]¶ Bases:
claf.tokens.indexer.base.TokenIndexer
Word Token Indexer
- Property
vocab: Vocab (claf.tokens.vocabulary)
- Args:
tokenizer: WordTokenizer
- Kwargs:
lowercase: word token to lowercase insert_start: insert start_token to first insert_end: append end_token