claf.modules.encoder package¶
Submodules¶
This code is from allenai/allennlp (https://github.com/allenai/allennlp/blob/master/allennlp/modules/lstm_cell_with_projection.py)

class
claf.modules.encoder.lstm_cell_with_projection.
LstmCellWithProjection
(input_size: int, hidden_size: int, cell_size: int, go_forward: bool = True, recurrent_dropout_probability: float = 0.0, memory_cell_clip_value: Optional[float] = None, state_projection_clip_value: Optional[float] = None)[source]¶ Bases:
torch.nn.modules.module.Module
An LSTM with Recurrent Dropout and a projected and clipped hidden state and memory. Note: this implementation is slower than the native Pytorch LSTM because it cannot make use of CUDNN optimizations for stacked RNNs due to and variational dropout and the custom nature of the cell state. Parameters ——— input_size :
int
, required.The dimension of the inputs to the LSTM.
 hidden_size
int
, required. The dimension of the outputs of the LSTM.
 cell_size
int
, required. The dimension of the memory cell used for the LSTM.
 go_forward:
bool
, optional (default = True) The direction in which the LSTM is applied to the sequence. Forwards by default, or backwards if False.
 recurrent_dropout_probability:
float
, optional (default = 0.0) The dropout probability to be used in a dropout scheme as stated in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks . Implementation wise, this simply applies a fixed dropout mask per sequence to the recurrent connection of the LSTM.
 state_projection_clip_value:
float
, optional, (default = None) The magnitude with which to clip the hidden_state after projecting it.
 memory_cell_clip_value:
float
, optional, (default = None) The magnitude with which to clip the memory cell.
 output_accumulator
torch.FloatTensor
The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.
 final_state:
Tuple[torch.FloatTensor, torch.FloatTensor]
The final (state, memory) states of the LSTM, with shape (1, batch_size, hidden_size) and (1, batch_size, cell_size) respectively. The first dimension is 1 in order to match the Pytorch API for returning stacked LSTM states.

forward
(inputs: torch.FloatTensor, batch_lengths: List[int], initial_state: Optional[Tuple[torch.Tensor, torch.Tensor]] = None)[source]¶  inputs
torch.FloatTensor
, required. A tensor of shape (batch_size, num_timesteps, input_size) to apply the LSTM over.
 batch_lengths
List[int]
, required. A list of length batch_size containing the lengths of the sequences in batch.
 initial_state
Tuple[torch.Tensor, torch.Tensor]
, optional, (default = None) A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The
state
has shape (1, batch_size, hidden_size) and thememory
has shape (1, batch_size, cell_size).
 output_accumulator
torch.FloatTensor
The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.
 final_state
Tuple[``torch.FloatTensor, torch.FloatTensor]
A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The
state
has shape (1, batch_size, hidden_size) and thememory
has shape (1, batch_size, cell_size).
 inputs
 hidden_size

claf.modules.encoder.lstm_cell_with_projection.
block_orthogonal
(tensor: torch.Tensor, split_sizes: List[int], gain: float = 1.0) → None[source]¶ An initializer which allows initializing model parameters in “blocks”. This is helpful in the case of recurrent models which use multiple gates applied to linear projections, which can be computed efficiently if they are concatenated together. However, they are separate parameters which should be initialized independently. Parameters ——— tensor :
torch.Tensor
, required.A tensor to initialize.
 split_sizesList[int], required.
A list of length
tensor.ndim()
specifying the size of the blocks along that particular dimension. E.g.[10, 20]
would result in the tensor being split into chunks of size 10 along the first dimension and 20 along the second. gainfloat, optional (default = 1.0)
The gain (scaling) applied to the orthogonal initialization.

claf.modules.encoder.lstm_cell_with_projection.
get_dropout_mask
(dropout_probability: float, tensor_for_masking: torch.Tensor)[source]¶ Computes and returns an elementwise dropout mask for a given tensor, where each element in the mask is dropped out with probability dropout_probability. Note that the mask is NOT applied to the tensor  the tensor is passed to retain the correct CUDA tensor type for the mask. Parameters ——— dropout_probability : float, required.
Probability of dropping a dimension of the input.
tensor_for_masking : torch.Tensor, required. Returns —— A torch.FloatTensor consisting of the binary mask scaled by 1/ (1  dropout_probability). This scaling ensures expected values and variances of the output of applying this mask
and the original tensor are the same.

claf.modules.encoder.lstm_cell_with_projection.
sort_batch_by_length
(tensor: torch.Tensor, sequence_lengths: torch.Tensor)[source]¶ Sort a batch first tensor by some specified lengths. Parameters ——— tensor : torch.FloatTensor, required.
A batch first Pytorch tensor.
 sequence_lengthstorch.LongTensor, required.
A tensor representing the lengths of some dimension of the tensor which we want to sort by.
 sorted_tensortorch.FloatTensor
The original tensor sorted along the batch dimension with respect to sequence_lengths.
 sorted_sequence_lengthstorch.LongTensor
The original sequence_lengths sorted by decreasing size.
 restoration_indicestorch.LongTensor
Indices into the sorted_tensor such that
sorted_tensor.index_select(0, restoration_indices) == original_tensor
 permuation_indextorch.LongTensor
The indices used to sort the tensor. This is useful if you want to sort many tensors using the same ordering.

class
claf.modules.encoder.positional.
PositionalEncoding
(embed_dim, max_length=2000)[source]¶ Bases:
torch.nn.modules.module.Module
 Positional Encoding
in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)
The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).
(cf. https://github.com/tensorflow/tensor2tensor/blob/42c3f377f441e5a0f431127d63e71414ead291c4/ tensor2tensor/layers/common_attention.py#L388)
 Args:
embed_dim: the number of embedding dimension
 Kwargs:
max_len: the number of maximum sequence length

forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Module contents¶

class
claf.modules.encoder.
PositionalEncoding
(embed_dim, max_length=2000)[source]¶ Bases:
torch.nn.modules.module.Module
 Positional Encoding
in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)
The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).
(cf. https://github.com/tensorflow/tensor2tensor/blob/42c3f377f441e5a0f431127d63e71414ead291c4/ tensor2tensor/layers/common_attention.py#L388)
 Args:
embed_dim: the number of embedding dimension
 Kwargs:
max_len: the number of maximum sequence length

forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class
claf.modules.encoder.
LstmCellWithProjection
(input_size: int, hidden_size: int, cell_size: int, go_forward: bool = True, recurrent_dropout_probability: float = 0.0, memory_cell_clip_value: Optional[float] = None, state_projection_clip_value: Optional[float] = None)[source]¶ Bases:
torch.nn.modules.module.Module
An LSTM with Recurrent Dropout and a projected and clipped hidden state and memory. Note: this implementation is slower than the native Pytorch LSTM because it cannot make use of CUDNN optimizations for stacked RNNs due to and variational dropout and the custom nature of the cell state. Parameters ——— input_size :
int
, required.The dimension of the inputs to the LSTM.
 hidden_size
int
, required. The dimension of the outputs of the LSTM.
 cell_size
int
, required. The dimension of the memory cell used for the LSTM.
 go_forward:
bool
, optional (default = True) The direction in which the LSTM is applied to the sequence. Forwards by default, or backwards if False.
 recurrent_dropout_probability:
float
, optional (default = 0.0) The dropout probability to be used in a dropout scheme as stated in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks . Implementation wise, this simply applies a fixed dropout mask per sequence to the recurrent connection of the LSTM.
 state_projection_clip_value:
float
, optional, (default = None) The magnitude with which to clip the hidden_state after projecting it.
 memory_cell_clip_value:
float
, optional, (default = None) The magnitude with which to clip the memory cell.
 output_accumulator
torch.FloatTensor
The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.
 final_state:
Tuple[torch.FloatTensor, torch.FloatTensor]
The final (state, memory) states of the LSTM, with shape (1, batch_size, hidden_size) and (1, batch_size, cell_size) respectively. The first dimension is 1 in order to match the Pytorch API for returning stacked LSTM states.

forward
(inputs: torch.FloatTensor, batch_lengths: List[int], initial_state: Optional[Tuple[torch.Tensor, torch.Tensor]] = None)[source]¶  inputs
torch.FloatTensor
, required. A tensor of shape (batch_size, num_timesteps, input_size) to apply the LSTM over.
 batch_lengths
List[int]
, required. A list of length batch_size containing the lengths of the sequences in batch.
 initial_state
Tuple[torch.Tensor, torch.Tensor]
, optional, (default = None) A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The
state
has shape (1, batch_size, hidden_size) and thememory
has shape (1, batch_size, cell_size).
 output_accumulator
torch.FloatTensor
The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.
 final_state
Tuple[``torch.FloatTensor, torch.FloatTensor]
A tuple (state, memory) representing the initial hidden state and memory of the LSTM. The
state
has shape (1, batch_size, hidden_size) and thememory
has shape (1, batch_size, cell_size).
 inputs
 hidden_size