claf.model.reading_comprehension package¶
Submodules¶
-
class
claf.model.reading_comprehension.bidaf.
BiDAF
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in BiDAF: Bidirectional Attention Flow for Machine Comprehension (https://arxiv.org/abs/1611.01603)
Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension contextual_rnn_num_layer: the number of recurrent layers (contextual) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.bidaf_no_answer.
BiDAF_No_Answer
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv2
,claf.model.base.ModelWithTokenEmbedder
Question Answering Model. Span Detector, No Answer
Bidirectional Attention Flow for Machine Comprehension + Bias (No_Answer)
Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.docqa.
DocQA
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension (https://arxiv.org/abs/1710.10723)
Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.docqa.
SelfAttention
(rnn_dim, linear_dim, dropout=0.2, weight_init=True)[source]¶ Bases:
torch.nn.modules.module.Module
Same bi-attention mechanism, only now between the passage and itself.
-
forward
(context, context_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
claf.model.reading_comprehension.docqa_no_answer.
DocQA_No_Answer
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv2
,claf.model.base.ModelWithTokenEmbedder
Question Answering Model. Span Detector, No Answer
Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension + No_Asnwer (https://arxiv.org/abs/1710.10723)
Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.docqa_no_answer.
NoAnswer
(embed_dim, bias_hidden_dim)[source]¶ Bases:
torch.nn.modules.module.Module
No-Answer Option
- Args:
embed_dim: the number of passage embedding dimension bias_hidden_dim: bias use two layer mlp, the number of hidden_size
-
forward
(context_embed, span_start_logits, span_end_logits)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.model.reading_comprehension.docqa_no_answer.
SelfAttention
(rnn_dim, linear_dim, dropout=0.2, weight_init=True)[source]¶ Bases:
torch.nn.modules.module.Module
Same bi-attention mechanism, only now between the passage and itself.
-
forward
(context, context_mask)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
claf.model.reading_comprehension.drqa.
DrQA
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, dropout=0.3)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in Reading Wikipedia to Answer Open-Domain Questions (https://arxiv.org/abs/1704.00051)
Embedding + features
Align question embedding
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.mixin.
ReadingComprehension
[source]¶ Bases:
object
Reading Comprehension Mixin Class
- Args:
token_embedder: ‘RCTokenEmbedder’, Used to embed the ‘context’ and ‘question’.
-
get_best_span
(span_start_logits, span_end_logits, answer_maxlen=None)[source]¶ Take argmax of constrained score_s * score_e.
- Args:
span_start_logits: independent start logits span_end_logits: independent end logits
- Kwargs:
answer_maxlen: max span length to consider (default is None -> All)
-
make_predictions
(output_dict)[source]¶ Make predictions with model’s output_dict
- Args:
- output_dict: model’s output dictionary consisting of
data_idx: question id
best_span: calculate the span_start_logits and span_end_logits to what is the best span
start_logits: span start logits
end_logits: span end logits
- Returns:
- predictions: prediction dictionary consisting of
key: ‘id’ (question id)
- value: consisting of dictionary
predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob
-
predict
(**kwargs)¶
-
print_examples
(index, inputs, predictions)[source]¶ Print evaluation examples
- Args:
index: data index inputs: mini-batch inputs predictions: prediction dictionary consisting of
key: ‘id’ (question id)
- value: consisting of dictionary
predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob
- Returns:
print(Context, Question, Answers and Predict)
-
class
claf.model.reading_comprehension.mixin.
SQuADv1
[source]¶ Bases:
claf.model.reading_comprehension.mixin.ReadingComprehension
- Reading Comprehension Mixin Class
with SQuAD v1.1 evaluation
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
-
make_metrics
(predictions)[source]¶ Make metrics with prediction dictionary
- Args:
- predictions: prediction dictionary consisting of
key: ‘id’ (question id)
value: (predict_text, pred_span_start, pred_span_end)
- Returns:
- metrics: metric dictionary consisting of
‘em’: exact_match (SQuAD v1.1 official evaluation)
‘f1’: f1 (SQuAD v1.1 official evaluation)
‘start_acc’: span_start accuracy
‘end_acc’: span_end accuracy
‘span_acc’: span accuracy (start and end)
-
class
claf.model.reading_comprehension.mixin.
SQuADv1ForBert
[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
- Reading Comprehension Mixin Class
with SQuAD v1.1 evaluation
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
-
predict
(output_dict, arguments, helper)[source]¶ Inference by raw_feature
- Args:
- output_dict: model’s output dictionary consisting of
data_idx: question id
best_span: calculate the span_start_logits and span_end_logits to what is the best span
arguments: arguments dictionary consisting of user_input helper: dictionary for helping get answer
- Returns:
span: predict best_span
-
class
claf.model.reading_comprehension.mixin.
SQuADv2
[source]¶ Bases:
claf.model.reading_comprehension.mixin.ReadingComprehension
- Reading Comprehension Mixin Class
with SQuAD v2.0 evaluation
- Args:
token_embedder: ‘RCTokenEmbedder’, Used to embed the ‘context’ and ‘question’.
-
make_metrics
(predictions)[source]¶ Make metrics with prediction dictionary
- Args:
- predictions: prediction dictionary consisting of
key: ‘id’ (question id)
- value: consisting of dictionary
predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob
- Returns:
- metrics: metric dictionary consisting of
‘start_acc’: span_start accuracy
‘end_acc’: span_end accuracy
‘span_acc’: span accuracy (start and end)
‘em’: exact_match (SQuAD v2.0 official evaluation)
‘f1’: f1 (SQuAD v2.0 official evaluation)
‘HasAns_exact’: has answer exact_match
‘HasAns_f1’: has answer f1
‘NoAns_exact’: no answer exact_match
‘NoAns_f1’: no answer f1
‘best_exact’: best exact_match score with best_exact_thresh
‘best_exact_thresh’: best exact_match answerable threshold
‘best_f1’: best f1 score with best_f1_thresh
‘best_f1_thresh’: best f1 answerable threshold
-
class
claf.model.reading_comprehension.qanet.
EncoderBlock
(model_dim=128, num_head=8, kernel_size=5, num_conv_block=4, dropout=0.1, layer_dropout=0.9)[source]¶ Bases:
torch.nn.modules.module.Module
Encoder Block
[]: residual position_encoding -> [convolution-layer] x # -> [self-attention-layer] -> [feed-forward-layer]
convolution-layer: depthwise separable convolutions
self-attention-layer: multi-head attention
feed-forward-layer: pointwise convolution
- Args:
model_dim: the number of model dimension num_heads: the number of head in multi-head attention kernel_size: convolution kernel size num_conv_block: the number of convolution block dropout: the dropout probability layer_dropout: the layer dropout probability
(cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )
-
forward
(x, mask=None)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
claf.model.reading_comprehension.qanet.
QANet
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, kernel_size_in_embedding=7, num_head_in_embedding=8, num_conv_block_in_embedding=4, num_embedding_encoder_block=1, kernel_size_in_modeling=5, num_head_in_modeling=8, num_conv_block_in_modeling=2, num_modeling_encoder_block=7, dropout=0.1, layer_dropout=0.9)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in QANet:Combining Local Convolution with Global Self-Attention for Reading Comprehension (https://arxiv.org/abs/1804.09541)
Input Embedding Layer
Embedding Encoder Layer
Context-Query Attention Layer
Model Encoder Layer
Output Layer
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension
Encoder Block Parameters (embedding, modeling) kernel_size: convolution kernel size in encoder block num_head: the number of multi-head attention’s head num_conv_block: the number of convolution block in encoder block
[Layernorm -> Conv (residual)]
- num_encoder_block: the number of the encoder block
- [position_encoding -> [n repeat conv block] -> Layernorm -> Self-attention (residual)
-> Layernorm -> Feedforward (residual)]
dropout: the dropout probability layer_dropout: the layer dropout probability
(cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
Module contents¶
-
class
claf.model.reading_comprehension.
BertForQA
(token_makers, lang_code='en', pretrained_model_name=None, answer_maxlen=30)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1ForBert
,claf.model.base.ModelWithoutTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] pretrained_model_name: the name of a pre-trained model answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen}
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.
BiDAF
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in BiDAF: Bidirectional Attention Flow for Machine Comprehension (https://arxiv.org/abs/1611.01603)
Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension contextual_rnn_num_layer: the number of recurrent layers (contextual) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.
QANet
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, kernel_size_in_embedding=7, num_head_in_embedding=8, num_conv_block_in_embedding=4, num_embedding_encoder_block=1, kernel_size_in_modeling=5, num_head_in_modeling=8, num_conv_block_in_modeling=2, num_modeling_encoder_block=7, dropout=0.1, layer_dropout=0.9)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in QANet:Combining Local Convolution with Global Self-Attention for Reading Comprehension (https://arxiv.org/abs/1804.09541)
Input Embedding Layer
Embedding Encoder Layer
Context-Query Attention Layer
Model Encoder Layer
Output Layer
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension
Encoder Block Parameters (embedding, modeling) kernel_size: convolution kernel size in encoder block num_head: the number of multi-head attention’s head num_conv_block: the number of convolution block in encoder block
[Layernorm -> Conv (residual)]
- num_encoder_block: the number of the encoder block
- [position_encoding -> [n repeat conv block] -> Layernorm -> Self-attention (residual)
-> Layernorm -> Feedforward (residual)]
dropout: the dropout probability layer_dropout: the layer dropout probability
(cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.
DocQA
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension (https://arxiv.org/abs/1710.10723)
Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.
DrQA
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, dropout=0.3)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1
,claf.model.base.ModelWithTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in Reading Wikipedia to Answer Open-Domain Questions (https://arxiv.org/abs/1704.00051)
Embedding + features
Align question embedding
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.
RoBertaForQA
(token_makers, lang_code='en', pretrained_model_name=None, answer_maxlen=30)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv1ForBert
,claf.model.base.ModelWithoutTokenEmbedder
Document Reader Model. Span Detector
Implementation of model presented in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] pretrained_model_name: the name of a pre-trained model answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen}
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.
BiDAF_No_Answer
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv2
,claf.model.base.ModelWithTokenEmbedder
Question Answering Model. Span Detector, No Answer
Bidirectional Attention Flow for Machine Comprehension + Bias (No_Answer)
Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.
-
class
claf.model.reading_comprehension.
DocQA_No_Answer
(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶ Bases:
claf.model.reading_comprehension.mixin.SQuADv2
,claf.model.base.ModelWithTokenEmbedder
Question Answering Model. Span Detector, No Answer
Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension + No_Asnwer (https://arxiv.org/abs/1710.10723)
Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output
- Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
- Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij
captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.
answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability
-
forward
(features, labels=None)[source]¶ - Args:
- features: feature dictionary like below.
- {“feature_name1”: {
“token_name1”: tensor, “toekn_name2”: tensor},
“feature_name2”: …}
- Kwargs:
- label: label dictionary like below.
- {“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
- Returns: output_dict (dict) consisting of
start_logits: representing unnormalized log probabilities of the span start position.
end_logits: representing unnormalized log probabilities of the span end position.
best_span: the string from the original passage that the model thinks is the best answer to the question.
data_idx: the question id, mapping with answer
loss: A scalar loss to be optimised.