claf.tokens.embedding package

Submodules

class claf.tokens.embedding.base.TokenEmbedding(vocab)[source]

Bases: torch.nn.modules.module.Module

Token Embedding

It can be embedding matrix, language model (ELMo), neural machine translation model (CoVe) and features.

  • Args:

    vocab: Vocab (rqa.tokens.vocab)

forward(tokens)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

get_vocab_size()[source]
class claf.tokens.embedding.bert_embedding.BertEmbedding(vocab, pretrained_model_name=None, trainable=False, unit='subword')[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

BERT Embedding(Encoder)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    pretrained_model_name: … use_as_embedding: … trainable: Finetune or fixed

forward(inputs)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

remove_cls_sep_token(inputs, outputs)[source]
class claf.tokens.embedding.char_embedding.CharEmbedding(vocab, dropout=0.2, embed_dim=16, kernel_sizes=[5], num_filter=100, activation='relu')[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Character Embedding (CharCNN) (https://arxiv.org/abs/1509.01626)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability embed_dim: The number of embedding dimension kernel_sizes: The list of kernel size (n-gram) num_filter: The number of cnn filter activation: Activation Function (eg. ReLU)

forward(chars)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.cove_embedding.CoveEmbedding(vocab, glove_pretrained_path=None, model_pretrained_path=None, dropout=0.2, trainable=False, project_dim=None)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Cove Embedding

Learned in Translation: Contextualized Word Vectors (http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed project_dim: The number of project (linear) dimension

forward(words)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.elmo_embedding.ELMoEmbedding(vocab, options_file='elmo_2x4096_512_2048cnn_2xhighway_options.json', weight_file='elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5', do_layer_norm=False, dropout=0.5, trainable=False, project_dim=None)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

ELMo Embedding Embedding From Language Model

Deep contextualized word representations (https://arxiv.org/abs/1802.0536)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    options_file: ELMo model config file path weight_file: ELMo model weight file path do_layer_norm: Should we apply layer normalization (passed to ScalarMix)?

    default is False

    dropout: The number of dropout probability trainable: Finetune or fixed project_dim: The number of project (linear) dimension

forward(chars)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.frequent_word_embedding.FrequentTuningWordEmbedding(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Frequent Word Finetuning Embedding Finetuning embedding matrix, according to ‘threshold_index’

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx

    (initialized to zeros) whenever it encounters the index.

    max_norm: If given, will renormalize the embedding vectors to have a norm lesser

    than this before extracting. Note: this will modify weight in-place.

    norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of

    frequency of the words in the mini-batch. Default False.

    sparse: if True, gradient w.r.t. weight will be a sparse tensor.

    See Notes under torch.nn.Embedding for more details regarding sparse gradients.

    pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed

forward(words, frequent_tuning=False)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.sparse_feature.OneHotEncoding(index, token_name, classes)[source]

Bases: torch.nn.modules.module.Module

Sparse to one-hot encoding

  • Args:

    vocab: Vocab (claf.tokens.vocab)

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_output_dim()[source]
class claf.tokens.embedding.sparse_feature.SparseFeature(vocab, embed_type, feature_count, params={})[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Sparse Feature

  1. Sparse to Embedding

  2. One Hot Encoding

  • Args:

    vocab: Vocab (claf.tokens.vocab) embed_type: The type of embedding [one_hot|embedding] feature_count: The number of feature count

  • Kwargs:

    params: additional parameters for embedding module

forward(inputs)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.sparse_feature.SparseToEmbedding(index, token_name, classes, dropout=0, embed_dim=15, trainable=True, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)[source]

Bases: torch.nn.modules.module.Module

Sparse to Embedding

  • Args:

    token_name: token_name

  • Kwargs:

    dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx

    (initialized to zeros) whenever it encounters the index.

    max_norm: If given, will renormalize the embedding vectors to have a norm lesser

    than this before extracting. Note: this will modify weight in-place.

    norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of

    frequency of the words in the mini-batch. Default False.

    sparse: if True, gradient w.r.t. weight will be a sparse tensor.

    See Notes under torch.nn.Embedding for more details regarding sparse gradients.

    pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_output_dim()[source]
class claf.tokens.embedding.word_embedding.WordEmbedding(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None, trainable=True)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Word Embedding Default Token Embedding

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx

    (initialized to zeros) whenever it encounters the index.

    max_norm: If given, will renormalize the embedding vectors to have a norm lesser

    than this before extracting. Note: this will modify weight in-place.

    norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of

    frequency of the words in the mini-batch. Default False.

    sparse: if True, gradient w.r.t. weight will be a sparse tensor.

    See Notes under torch.nn.Embedding for more details regarding sparse gradients.

    pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed

forward(words)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

Module contents

class claf.tokens.embedding.BertEmbedding(vocab, pretrained_model_name=None, trainable=False, unit='subword')[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

BERT Embedding(Encoder)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    pretrained_model_name: … use_as_embedding: … trainable: Finetune or fixed

forward(inputs)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

remove_cls_sep_token(inputs, outputs)[source]
class claf.tokens.embedding.CharEmbedding(vocab, dropout=0.2, embed_dim=16, kernel_sizes=[5], num_filter=100, activation='relu')[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Character Embedding (CharCNN) (https://arxiv.org/abs/1509.01626)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability embed_dim: The number of embedding dimension kernel_sizes: The list of kernel size (n-gram) num_filter: The number of cnn filter activation: Activation Function (eg. ReLU)

forward(chars)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.CoveEmbedding(vocab, glove_pretrained_path=None, model_pretrained_path=None, dropout=0.2, trainable=False, project_dim=None)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Cove Embedding

Learned in Translation: Contextualized Word Vectors (http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed project_dim: The number of project (linear) dimension

forward(words)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.ELMoEmbedding(vocab, options_file='elmo_2x4096_512_2048cnn_2xhighway_options.json', weight_file='elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5', do_layer_norm=False, dropout=0.5, trainable=False, project_dim=None)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

ELMo Embedding Embedding From Language Model

Deep contextualized word representations (https://arxiv.org/abs/1802.0536)

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    options_file: ELMo model config file path weight_file: ELMo model weight file path do_layer_norm: Should we apply layer normalization (passed to ScalarMix)?

    default is False

    dropout: The number of dropout probability trainable: Finetune or fixed project_dim: The number of project (linear) dimension

forward(chars)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.FrequentTuningWordEmbedding(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Frequent Word Finetuning Embedding Finetuning embedding matrix, according to ‘threshold_index’

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx

    (initialized to zeros) whenever it encounters the index.

    max_norm: If given, will renormalize the embedding vectors to have a norm lesser

    than this before extracting. Note: this will modify weight in-place.

    norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of

    frequency of the words in the mini-batch. Default False.

    sparse: if True, gradient w.r.t. weight will be a sparse tensor.

    See Notes under torch.nn.Embedding for more details regarding sparse gradients.

    pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed

forward(words, frequent_tuning=False)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.SparseFeature(vocab, embed_type, feature_count, params={})[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Sparse Feature

  1. Sparse to Embedding

  2. One Hot Encoding

  • Args:

    vocab: Vocab (claf.tokens.vocab) embed_type: The type of embedding [one_hot|embedding] feature_count: The number of feature count

  • Kwargs:

    params: additional parameters for embedding module

forward(inputs)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension

class claf.tokens.embedding.WordEmbedding(vocab, dropout=0.2, embed_dim=100, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, pretrained_path=None, trainable=True)[source]

Bases: claf.tokens.embedding.base.TokenEmbedding

Word Embedding Default Token Embedding

  • Args:

    vocab: Vocab (claf.tokens.vocab)

  • Kwargs:

    dropout: The number of dropout probability embed_dim: The number of embedding dimension padding_idx: If given, pads the output with the embedding vector at padding_idx

    (initialized to zeros) whenever it encounters the index.

    max_norm: If given, will renormalize the embedding vectors to have a norm lesser

    than this before extracting. Note: this will modify weight in-place.

    norm_type: The p of the p-norm to compute for the max_norm option. Default 2. scale_grad_by_freq: if given, this will scale gradients by the inverse of

    frequency of the words in the mini-batch. Default False.

    sparse: if True, gradient w.r.t. weight will be a sparse tensor.

    See Notes under torch.nn.Embedding for more details regarding sparse gradients.

    pretrained_path: pretrained vector path (eg. GloVe) trainable: finetune or fixed

forward(words)[source]

embedding look-up

get_output_dim()[source]

get embedding dimension