claf.tokens.embedding package¶
Submodules¶
-
class
claf.tokens.embedding.base.
TokenEmbedding
(vocab)[source]¶ Bases:
torch.nn.modules.module.Module
Token Embedding
It can be embedding matrix, language model (ELMo), neural machine translation model (CoVe) and features.
- Args:
vocab: Vocab (rqa.tokens.vocab)
-
class
claf.tokens.embedding.bert_embedding.
BertEmbedding
(vocab, pretrained_model_name=None, trainable=False, unit='subword')[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
BERT Embedding(Encoder)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
pretrained_model_name: … use_as_embedding: … trainable: Finetune or fixed
-
class
claf.tokens.embedding.char_embedding.
CharEmbedding
(vocab, dropout=0.2, embed_dim=16, kernel_sizes=[5], num_filter=100, activation='relu')[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Character Embedding (CharCNN) (https://arxiv.org/abs/1509.01626)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability embed_dim: The number of embedding dimension kernel_sizes: The list of kernel size (n-gram) num_filter: The number of cnn filter activation: Activation Function (eg. ReLU)
-
class
claf.tokens.embedding.cove_embedding.
CoveEmbedding
(vocab, glove_pretrained_path=None, model_pretrained_path=None, dropout=0.2, trainable=False, project_dim=None)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Cove Embedding
Learned in Translation: Contextualized Word Vectors (http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed project_dim: The number of project (linear) dimension
-
class
claf.tokens.embedding.elmo_embedding.
ELMoEmbedding
(vocab, options_file='elmo_2x4096_512_2048cnn_2xhighway_options.json', weight_file='elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5', do_layer_norm=False, dropout=0.5, trainable=False, project_dim=None)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
ELMo Embedding Embedding From Language Model
Deep contextualized word representations (https://arxiv.org/abs/1802.0536)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
options_file: ELMo model config file path weight_file: ELMo model weight file path do_layer_norm: Should we apply layer normalization (passed to
ScalarMix
)?default is False
dropout: The number of dropout probability trainable: Finetune or fixed project_dim: The number of project (linear) dimension
-
class
claf.tokens.embedding.frequent_word_embedding.
FrequentTuningWordEmbedding
(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Frequent Word Finetuning Embedding Finetuning embedding matrix, according to ‘threshold_index’
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx
(initialized to zeros) whenever it encounters the index.
- max_norm: If given, will renormalize the embedding vectors to have a norm lesser
than this before extracting. Note: this will modify weight in-place.
norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of
frequency of the words in the mini-batch. Default False.
- sparse: if True, gradient w.r.t. weight will be a sparse tensor.
See Notes under torch.nn.Embedding for more details regarding sparse gradients.
pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed
-
class
claf.tokens.embedding.sparse_feature.
OneHotEncoding
(index, token_name, classes)[source]¶ Bases:
torch.nn.modules.module.Module
Sparse to one-hot encoding
- Args:
vocab: Vocab (claf.tokens.vocab)
-
forward
(inputs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.tokens.embedding.sparse_feature.
SparseFeature
(vocab, embed_type, feature_count, params={})[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Sparse Feature
Sparse to Embedding
One Hot Encoding
- Args:
vocab: Vocab (claf.tokens.vocab) embed_type: The type of embedding [one_hot|embedding] feature_count: The number of feature count
- Kwargs:
params: additional parameters for embedding module
-
class
claf.tokens.embedding.sparse_feature.
SparseToEmbedding
(index, token_name, classes, dropout=0, embed_dim=15, trainable=True, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)[source]¶ Bases:
torch.nn.modules.module.Module
Sparse to Embedding
- Args:
token_name: token_name
- Kwargs:
dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx
(initialized to zeros) whenever it encounters the index.
- max_norm: If given, will renormalize the embedding vectors to have a norm lesser
than this before extracting. Note: this will modify weight in-place.
norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of
frequency of the words in the mini-batch. Default False.
- sparse: if True, gradient w.r.t. weight will be a sparse tensor.
See Notes under torch.nn.Embedding for more details regarding sparse gradients.
pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed
-
forward
(inputs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.tokens.embedding.word_embedding.
WordEmbedding
(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None, trainable=True)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Word Embedding Default Token Embedding
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx
(initialized to zeros) whenever it encounters the index.
- max_norm: If given, will renormalize the embedding vectors to have a norm lesser
than this before extracting. Note: this will modify weight in-place.
norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of
frequency of the words in the mini-batch. Default False.
- sparse: if True, gradient w.r.t. weight will be a sparse tensor.
See Notes under torch.nn.Embedding for more details regarding sparse gradients.
pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed
Module contents¶
-
class
claf.tokens.embedding.
BertEmbedding
(vocab, pretrained_model_name=None, trainable=False, unit='subword')[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
BERT Embedding(Encoder)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
pretrained_model_name: … use_as_embedding: … trainable: Finetune or fixed
-
class
claf.tokens.embedding.
CharEmbedding
(vocab, dropout=0.2, embed_dim=16, kernel_sizes=[5], num_filter=100, activation='relu')[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Character Embedding (CharCNN) (https://arxiv.org/abs/1509.01626)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability embed_dim: The number of embedding dimension kernel_sizes: The list of kernel size (n-gram) num_filter: The number of cnn filter activation: Activation Function (eg. ReLU)
-
class
claf.tokens.embedding.
CoveEmbedding
(vocab, glove_pretrained_path=None, model_pretrained_path=None, dropout=0.2, trainable=False, project_dim=None)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Cove Embedding
Learned in Translation: Contextualized Word Vectors (http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed project_dim: The number of project (linear) dimension
-
class
claf.tokens.embedding.
ELMoEmbedding
(vocab, options_file='elmo_2x4096_512_2048cnn_2xhighway_options.json', weight_file='elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5', do_layer_norm=False, dropout=0.5, trainable=False, project_dim=None)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
ELMo Embedding Embedding From Language Model
Deep contextualized word representations (https://arxiv.org/abs/1802.0536)
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
options_file: ELMo model config file path weight_file: ELMo model weight file path do_layer_norm: Should we apply layer normalization (passed to
ScalarMix
)?default is False
dropout: The number of dropout probability trainable: Finetune or fixed project_dim: The number of project (linear) dimension
-
class
claf.tokens.embedding.
FrequentTuningWordEmbedding
(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Frequent Word Finetuning Embedding Finetuning embedding matrix, according to ‘threshold_index’
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx
(initialized to zeros) whenever it encounters the index.
- max_norm: If given, will renormalize the embedding vectors to have a norm lesser
than this before extracting. Note: this will modify weight in-place.
norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of
frequency of the words in the mini-batch. Default False.
- sparse: if True, gradient w.r.t. weight will be a sparse tensor.
See Notes under torch.nn.Embedding for more details regarding sparse gradients.
pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed
-
class
claf.tokens.embedding.
SparseFeature
(vocab, embed_type, feature_count, params={})[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Sparse Feature
Sparse to Embedding
One Hot Encoding
- Args:
vocab: Vocab (claf.tokens.vocab) embed_type: The type of embedding [one_hot|embedding] feature_count: The number of feature count
- Kwargs:
params: additional parameters for embedding module
-
class
claf.tokens.embedding.
WordEmbedding
(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None, trainable=True)[source]¶ Bases:
claf.tokens.embedding.base.TokenEmbedding
Word Embedding Default Token Embedding
- Args:
vocab: Vocab (claf.tokens.vocab)
- Kwargs:
dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx
(initialized to zeros) whenever it encounters the index.
- max_norm: If given, will renormalize the embedding vectors to have a norm lesser
than this before extracting. Note: this will modify weight in-place.
norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of
frequency of the words in the mini-batch. Default False.
- sparse: if True, gradient w.r.t. weight will be a sparse tensor.
See Notes under torch.nn.Embedding for more details regarding sparse gradients.
pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed