claf.modules.encoder package

Submodules

This code is from allenai/allennlp (https://github.com/allenai/allennlp/blob/master/allennlp/modules/lstm_cell_with_projection.py)

class claf.modules.encoder.lstm_cell_with_projection.LstmCellWithProjection(input_size: int, hidden_size: int, cell_size: int, go_forward: bool = True, recurrent_dropout_probability: float = 0.0, memory_cell_clip_value: Optional[float] = None, state_projection_clip_value: Optional[float] = None)[source]

Bases: torch.nn.modules.module.Module

An LSTM with Recurrent Dropout and a projected and clipped hidden state and memory. Note: this implementation is slower than the native Pytorch LSTM because it cannot make use of CUDNN optimizations for stacked RNNs due to and variational dropout and the custom nature of the cell state. Parameters ———- input_size : int, required.

The dimension of the inputs to the LSTM.

hidden_sizeint, required.

The dimension of the outputs of the LSTM.

cell_sizeint, required.

The dimension of the memory cell used for the LSTM.

go_forward: bool, optional (default = True)

The direction in which the LSTM is applied to the sequence. Forwards by default, or backwards if False.

recurrent_dropout_probability: float, optional (default = 0.0)

The dropout probability to be used in a dropout scheme as stated in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks . Implementation wise, this simply applies a fixed dropout mask per sequence to the recurrent connection of the LSTM.

state_projection_clip_value: float, optional, (default = None)

The magnitude with which to clip the hidden_state after projecting it.

memory_cell_clip_value: float, optional, (default = None)

The magnitude with which to clip the memory cell.

output_accumulatortorch.FloatTensor

The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.

final_state: Tuple[torch.FloatTensor, torch.FloatTensor]

The final (state, memory) states of the LSTM, with shape (1, batch_size, hidden_size) and (1, batch_size, cell_size) respectively. The first dimension is 1 in order to match the Pytorch API for returning stacked LSTM states.

forward(inputs: torch.FloatTensor, batch_lengths: List[int], initial_state: Optional[Tuple[torch.Tensor, torch.Tensor]] = None)[source]
inputstorch.FloatTensor, required.

A tensor of shape (batch_size, num_timesteps, input_size) to apply the LSTM over.

batch_lengthsList[int], required.

A list of length batch_size containing the lengths of the sequences in batch.

initial_stateTuple[torch.Tensor, torch.Tensor], optional, (default = None)

A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The state has shape (1, batch_size, hidden_size) and the memory has shape (1, batch_size, cell_size).

output_accumulatortorch.FloatTensor

The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.

final_stateTuple[``torch.FloatTensor, torch.FloatTensor]

A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The state has shape (1, batch_size, hidden_size) and the memory has shape (1, batch_size, cell_size).

reset_parameters()[source]
claf.modules.encoder.lstm_cell_with_projection.block_orthogonal(tensor: torch.Tensor, split_sizes: List[int], gain: float = 1.0) → None[source]

An initializer which allows initializing model parameters in “blocks”. This is helpful in the case of recurrent models which use multiple gates applied to linear projections, which can be computed efficiently if they are concatenated together. However, they are separate parameters which should be initialized independently. Parameters ———- tensor : torch.Tensor, required.

A tensor to initialize.

split_sizesList[int], required.

A list of length tensor.ndim() specifying the size of the blocks along that particular dimension. E.g. [10, 20] would result in the tensor being split into chunks of size 10 along the first dimension and 20 along the second.

gainfloat, optional (default = 1.0)

The gain (scaling) applied to the orthogonal initialization.

claf.modules.encoder.lstm_cell_with_projection.get_dropout_mask(dropout_probability: float, tensor_for_masking: torch.Tensor)[source]

Computes and returns an element-wise dropout mask for a given tensor, where each element in the mask is dropped out with probability dropout_probability. Note that the mask is NOT applied to the tensor - the tensor is passed to retain the correct CUDA tensor type for the mask. Parameters ———- dropout_probability : float, required.

Probability of dropping a dimension of the input.

tensor_for_masking : torch.Tensor, required. Returns ——- A torch.FloatTensor consisting of the binary mask scaled by 1/ (1 - dropout_probability). This scaling ensures expected values and variances of the output of applying this mask

and the original tensor are the same.

claf.modules.encoder.lstm_cell_with_projection.sort_batch_by_length(tensor: torch.Tensor, sequence_lengths: torch.Tensor)[source]

Sort a batch first tensor by some specified lengths. Parameters ———- tensor : torch.FloatTensor, required.

A batch first Pytorch tensor.

sequence_lengthstorch.LongTensor, required.

A tensor representing the lengths of some dimension of the tensor which we want to sort by.

sorted_tensortorch.FloatTensor

The original tensor sorted along the batch dimension with respect to sequence_lengths.

sorted_sequence_lengthstorch.LongTensor

The original sequence_lengths sorted by decreasing size.

restoration_indicestorch.LongTensor

Indices into the sorted_tensor such that sorted_tensor.index_select(0, restoration_indices) == original_tensor

permuation_indextorch.LongTensor

The indices used to sort the tensor. This is useful if you want to sort many tensors using the same ordering.

class claf.modules.encoder.positional.PositionalEncoding(embed_dim, max_length=2000)[source]

Bases: torch.nn.modules.module.Module

Positional Encoding

in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)

The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).

(cf. https://github.com/tensorflow/tensor2tensor/blob/42c3f377f441e5a0f431127d63e71414ead291c4/ tensor2tensor/layers/common_attention.py#L388)

  • Args:

    embed_dim: the number of embedding dimension

  • Kwargs:

    max_len: the number of maximum sequence length

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Module contents

class claf.modules.encoder.PositionalEncoding(embed_dim, max_length=2000)[source]

Bases: torch.nn.modules.module.Module

Positional Encoding

in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)

The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).

(cf. https://github.com/tensorflow/tensor2tensor/blob/42c3f377f441e5a0f431127d63e71414ead291c4/ tensor2tensor/layers/common_attention.py#L388)

  • Args:

    embed_dim: the number of embedding dimension

  • Kwargs:

    max_len: the number of maximum sequence length

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.encoder.LstmCellWithProjection(input_size: int, hidden_size: int, cell_size: int, go_forward: bool = True, recurrent_dropout_probability: float = 0.0, memory_cell_clip_value: Optional[float] = None, state_projection_clip_value: Optional[float] = None)[source]

Bases: torch.nn.modules.module.Module

An LSTM with Recurrent Dropout and a projected and clipped hidden state and memory. Note: this implementation is slower than the native Pytorch LSTM because it cannot make use of CUDNN optimizations for stacked RNNs due to and variational dropout and the custom nature of the cell state. Parameters ———- input_size : int, required.

The dimension of the inputs to the LSTM.

hidden_sizeint, required.

The dimension of the outputs of the LSTM.

cell_sizeint, required.

The dimension of the memory cell used for the LSTM.

go_forward: bool, optional (default = True)

The direction in which the LSTM is applied to the sequence. Forwards by default, or backwards if False.

recurrent_dropout_probability: float, optional (default = 0.0)

The dropout probability to be used in a dropout scheme as stated in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks . Implementation wise, this simply applies a fixed dropout mask per sequence to the recurrent connection of the LSTM.

state_projection_clip_value: float, optional, (default = None)

The magnitude with which to clip the hidden_state after projecting it.

memory_cell_clip_value: float, optional, (default = None)

The magnitude with which to clip the memory cell.

output_accumulatortorch.FloatTensor

The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.

final_state: Tuple[torch.FloatTensor, torch.FloatTensor]

The final (state, memory) states of the LSTM, with shape (1, batch_size, hidden_size) and (1, batch_size, cell_size) respectively. The first dimension is 1 in order to match the Pytorch API for returning stacked LSTM states.

forward(inputs: torch.FloatTensor, batch_lengths: List[int], initial_state: Optional[Tuple[torch.Tensor, torch.Tensor]] = None)[source]
inputstorch.FloatTensor, required.

A tensor of shape (batch_size, num_timesteps, input_size) to apply the LSTM over.

batch_lengthsList[int], required.

A list of length batch_size containing the lengths of the sequences in batch.

initial_stateTuple[torch.Tensor, torch.Tensor], optional, (default = None)

A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The state has shape (1, batch_size, hidden_size) and the memory has shape (1, batch_size, cell_size).

output_accumulatortorch.FloatTensor

The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.

final_stateTuple[``torch.FloatTensor, torch.FloatTensor]

A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The state has shape (1, batch_size, hidden_size) and the memory has shape (1, batch_size, cell_size).

reset_parameters()[source]