claf.modules.attention package¶
Submodules¶
-
class
claf.modules.attention.bi_attention.
BiAttention
(model_dim)[source]¶ Bases:
torch.nn.modules.module.Module
- Attention Flow Layer
in BiDAF (https://arxiv.org/pdf/1611.01603.pdf)
The Similarity matrix Context-to-query Attention (C2Q) Query-to-context Attention (Q2C)
- Args:
model_dim: The number of module dimension
-
forward
(context, context_mask, query, query_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.modules.attention.co_attention.
CoAttention
(embed_dim)[source]¶ Bases:
torch.nn.modules.module.Module
- CoAttention encoder
in Dynamic Coattention Networks For Question Answering (https://arxiv.org/abs/1611.01604)
check the Figure 2 in paper
- Args:
embed_dim: the number of input embedding dimension
-
forward
(context_embed, question_embed, context_mask=None, question_mask=None)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.modules.attention.docqa_attention.
DocQAAttention
(rnn_dim, linear_dim, self_attn=False, weight_init=True)[source]¶ Bases:
torch.nn.modules.module.Module
- Bi-Attention Layer + (Self-Attention)
in DocumentQA (https://arxiv.org/abs/1710.10723)
- Args:
rnn_dim: the number of GRU cell hidden size linear_dim: the number of linear hidden size
- Kwargs:
self_attn: (bool) self-attention weight_init: (bool) weight initialization
-
forward
(x, x_mask, key, key_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.modules.attention.multi_head_attention.
MultiHeadAttention
(num_head=8, model_dim=100, dropout=0.1, linear_key_dim=None, linear_value_dim=None)[source]¶ Bases:
torch.nn.modules.module.Module
- Transformer’s Multi-Head Attention
in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)
- Kwargs:
num_head: the number of Head model_dim: the number of model dimension linear_key_dim: the number of linear key dimemsion linear_value_dim: the number of linear value dimension
-
forward
(q, k, v, mask=None)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
original code from: https://github.com/facebookresearch/DrQA/blob/master/drqa/reader/layers.py
-
class
claf.modules.attention.seq_attention.
BilinearSeqAttn
(x_size, y_size, identity=False, normalize=True)[source]¶ Bases:
torch.nn.modules.module.Module
A bilinear attention layer over a sequence X w.r.t y: * o_i = softmax(x_i’Wy) for x_i in X. Optionally don’t normalize output weights.
-
forward
(x, y, x_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
claf.modules.attention.seq_attention.
LinearSeqAttn
(input_size)[source]¶ Bases:
torch.nn.modules.module.Module
Self attention over a sequence: * o_i = softmax(Wx_i) for x_i in X.
-
forward
(x, x_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
claf.modules.attention.seq_attention.
SeqAttnMatch
(embed_dim, identity=False)[source]¶ Bases:
torch.nn.modules.module.Module
Given sequences X and Y, match sequence Y to each element in X. * o_i = sum(alpha_j * y_j) for i in X * alpha_j = softmax(y_j * x_i)
-
forward
(x, y, y_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
Module contents¶
-
class
claf.modules.attention.
BiAttention
(model_dim)[source]¶ Bases:
torch.nn.modules.module.Module
- Attention Flow Layer
in BiDAF (https://arxiv.org/pdf/1611.01603.pdf)
The Similarity matrix Context-to-query Attention (C2Q) Query-to-context Attention (Q2C)
- Args:
model_dim: The number of module dimension
-
forward
(context, context_mask, query, query_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.modules.attention.
CoAttention
(embed_dim)[source]¶ Bases:
torch.nn.modules.module.Module
- CoAttention encoder
in Dynamic Coattention Networks For Question Answering (https://arxiv.org/abs/1611.01604)
check the Figure 2 in paper
- Args:
embed_dim: the number of input embedding dimension
-
forward
(context_embed, question_embed, context_mask=None, question_mask=None)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.modules.attention.
MultiHeadAttention
(num_head=8, model_dim=100, dropout=0.1, linear_key_dim=None, linear_value_dim=None)[source]¶ Bases:
torch.nn.modules.module.Module
- Transformer’s Multi-Head Attention
in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)
- Kwargs:
num_head: the number of Head model_dim: the number of model dimension linear_key_dim: the number of linear key dimemsion linear_value_dim: the number of linear value dimension
-
forward
(q, k, v, mask=None)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.modules.attention.
DocQAAttention
(rnn_dim, linear_dim, self_attn=False, weight_init=True)[source]¶ Bases:
torch.nn.modules.module.Module
- Bi-Attention Layer + (Self-Attention)
in DocumentQA (https://arxiv.org/abs/1710.10723)
- Args:
rnn_dim: the number of GRU cell hidden size linear_dim: the number of linear hidden size
- Kwargs:
self_attn: (bool) self-attention weight_init: (bool) weight initialization
-
forward
(x, x_mask, key, key_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.modules.attention.
SeqAttnMatch
(embed_dim, identity=False)[source]¶ Bases:
torch.nn.modules.module.Module
Given sequences X and Y, match sequence Y to each element in X. * o_i = sum(alpha_j * y_j) for i in X * alpha_j = softmax(y_j * x_i)
-
forward
(x, y, y_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
claf.modules.attention.
LinearSeqAttn
(input_size)[source]¶ Bases:
torch.nn.modules.module.Module
Self attention over a sequence: * o_i = softmax(Wx_i) for x_i in X.
-
forward
(x, x_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
claf.modules.attention.
BilinearSeqAttn
(x_size, y_size, identity=False, normalize=True)[source]¶ Bases:
torch.nn.modules.module.Module
A bilinear attention layer over a sequence X w.r.t y: * o_i = softmax(x_i’Wy) for x_i in X. Optionally don’t normalize output weights.
-
forward
(x, y, x_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-