claf.modules.attention package¶

Submodules¶

class claf.modules.attention.bi_attention.BiAttention(model_dim)[source]¶

Bases: torch.nn.modules.module.Module

Attention Flow Layer: in BiDAF (https://arxiv.org/pdf/1611.01603.pdf)

The Similarity matrix Context-to-query Attention (C2Q) Query-to-context Attention (Q2C)

Args:
model_dim: The number of module dimension

forward(context, context_mask, query, query_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.co_attention.CoAttention(embed_dim)[source]¶

Bases: torch.nn.modules.module.Module

CoAttention encoder: in Dynamic Coattention Networks For Question Answering (https://arxiv.org/abs/1611.01604)

check the Figure 2 in paper

Args:
embed_dim: the number of input embedding dimension

forward(context_embed, question_embed, context_mask=None, question_mask=None)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.docqa_attention.DocQAAttention(rnn_dim, linear_dim, self_attn=False, weight_init=True)[source]¶

Bases: torch.nn.modules.module.Module

Bi-Attention Layer + (Self-Attention): in DocumentQA (https://arxiv.org/abs/1710.10723)

Args:
rnn_dim: the number of GRU cell hidden size linear_dim: the number of linear hidden size
Kwargs:
self_attn: (bool) self-attention weight_init: (bool) weight initialization

forward(x, x_mask, key, key_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.multi_head_attention.MultiHeadAttention(num_head=8, model_dim=100, dropout=0.1, linear_key_dim=None, linear_value_dim=None)[source]¶

Bases: torch.nn.modules.module.Module

Transformer’s Multi-Head Attention: in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)

Kwargs:
num_head: the number of Head model_dim: the number of model dimension linear_key_dim: the number of linear key dimemsion linear_value_dim: the number of linear value dimension

forward(q, k, v, mask=None)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

original code from: https://github.com/facebookresearch/DrQA/blob/master/drqa/reader/layers.py

class claf.modules.attention.seq_attention.BilinearSeqAttn(x_size, y_size, identity=False, normalize=True)[source]¶

Bases: torch.nn.modules.module.Module

A bilinear attention layer over a sequence X w.r.t y: * o_i = softmax(x_i’Wy) for x_i in X. Optionally don’t normalize output weights.

forward(x, y, x_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.seq_attention.LinearSeqAttn(input_size)[source]¶

Bases: torch.nn.modules.module.Module

Self attention over a sequence: * o_i = softmax(Wx_i) for x_i in X.

forward(x, x_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.seq_attention.SeqAttnMatch(embed_dim, identity=False)[source]¶

Bases: torch.nn.modules.module.Module

Given sequences X and Y, match sequence Y to each element in X. * o_i = sum(alpha_j * y_j) for i in X * alpha_j = softmax(y_j * x_i)

forward(x, y, y_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Module contents¶

class claf.modules.attention.BiAttention(model_dim)[source]¶

Bases: torch.nn.modules.module.Module

Attention Flow Layer: in BiDAF (https://arxiv.org/pdf/1611.01603.pdf)

The Similarity matrix Context-to-query Attention (C2Q) Query-to-context Attention (Q2C)

Args:
model_dim: The number of module dimension

forward(context, context_mask, query, query_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.CoAttention(embed_dim)[source]¶

Bases: torch.nn.modules.module.Module

CoAttention encoder: in Dynamic Coattention Networks For Question Answering (https://arxiv.org/abs/1611.01604)

check the Figure 2 in paper

Args:
embed_dim: the number of input embedding dimension

forward(context_embed, question_embed, context_mask=None, question_mask=None)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.MultiHeadAttention(num_head=8, model_dim=100, dropout=0.1, linear_key_dim=None, linear_value_dim=None)[source]¶

Bases: torch.nn.modules.module.Module

Transformer’s Multi-Head Attention: in “Attention is All You Need” (https://arxiv.org/abs/1706.03762)

Kwargs:
num_head: the number of Head model_dim: the number of model dimension linear_key_dim: the number of linear key dimemsion linear_value_dim: the number of linear value dimension

forward(q, k, v, mask=None)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.DocQAAttention(rnn_dim, linear_dim, self_attn=False, weight_init=True)[source]¶

Bases: torch.nn.modules.module.Module

Bi-Attention Layer + (Self-Attention): in DocumentQA (https://arxiv.org/abs/1710.10723)

Args:
rnn_dim: the number of GRU cell hidden size linear_dim: the number of linear hidden size
Kwargs:
self_attn: (bool) self-attention weight_init: (bool) weight initialization

forward(x, x_mask, key, key_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.SeqAttnMatch(embed_dim, identity=False)[source]¶

Bases: torch.nn.modules.module.Module

Given sequences X and Y, match sequence Y to each element in X. * o_i = sum(alpha_j * y_j) for i in X * alpha_j = softmax(y_j * x_i)

forward(x, y, y_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.LinearSeqAttn(input_size)[source]¶

Bases: torch.nn.modules.module.Module

Self attention over a sequence: * o_i = softmax(Wx_i) for x_i in X.

forward(x, x_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.modules.attention.BilinearSeqAttn(x_size, y_size, identity=False, normalize=True)[source]¶

Bases: torch.nn.modules.module.Module

A bilinear attention layer over a sequence X w.r.t y: * o_i = softmax(x_i’Wy) for x_i in X. Optionally don’t normalize output weights.

forward(x, y, x_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.