claf.model.reading_comprehension package¶

Submodules¶

class claf.model.reading_comprehension.bidaf.BiDAF(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BiDAF: Bidirectional Attention Flow for Machine Comprehension (https://arxiv.org/abs/1611.01603)

Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension contextual_rnn_num_layer: the number of recurrent layers (contextual) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.bidaf_no_answer.BiDAF_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Bidirectional Attention Flow for Machine Comprehension + Bias (No_Answer)

Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.docqa.DocQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension (https://arxiv.org/abs/1710.10723)

Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.docqa.SelfAttention(rnn_dim, linear_dim, dropout=0.2, weight_init=True)[source]¶

Bases: torch.nn.modules.module.Module

Same bi-attention mechanism, only now between the passage and itself.

forward(context, context_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.docqa_no_answer.DocQA_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension + No_Asnwer (https://arxiv.org/abs/1710.10723)

Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.docqa_no_answer.NoAnswer(embed_dim, bias_hidden_dim)[source]¶

Bases: torch.nn.modules.module.Module

No-Answer Option

Args:
embed_dim: the number of passage embedding dimension bias_hidden_dim: bias use two layer mlp, the number of hidden_size

forward(context_embed, span_start_logits, span_end_logits)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.docqa_no_answer.SelfAttention(rnn_dim, linear_dim, dropout=0.2, weight_init=True)[source]¶

Bases: torch.nn.modules.module.Module

Same bi-attention mechanism, only now between the passage and itself.

forward(context, context_mask)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.drqa.DrQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, dropout=0.3)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Reading Wikipedia to Answer Open-Domain Questions (https://arxiv.org/abs/1704.00051)

Embedding + features
Align question embedding

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.mixin.ReadingComprehension[source]¶

Bases: object

Reading Comprehension Mixin Class

Args:
token_embedder: ‘RCTokenEmbedder’, Used to embed the ‘context’ and ‘question’.

get_best_span(span_start_logits, span_end_logits, answer_maxlen=None)[source]¶

Take argmax of constrained score_s * score_e.

Args:
span_start_logits: independent start logits span_end_logits: independent end logits
Kwargs:
answer_maxlen: max span length to consider (default is None -> All)

make_predictions(output_dict)[source]¶

Make predictions with model’s output_dict

Args:
output_dict: model’s output dictionary consisting of
data_idx: question id

best_span: calculate the span_start_logits and span_end_logits to what is the best span

start_logits: span start logits

end_logits: span end logits
Returns:
predictions: prediction dictionary consisting of
key: ‘id’ (question id)

value: consisting of dictionary
predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob

predict(**kwargs)¶

print_examples(index, inputs, predictions)[source]¶

Print evaluation examples

Args:
index: data index inputs: mini-batch inputs predictions: prediction dictionary consisting of
key: ‘id’ (question id)

value: consisting of dictionary
predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob
Returns:
print(Context, Question, Answers and Predict)

write_predictions(predictions, file_path=None, is_dict=True)[source]¶

class claf.model.reading_comprehension.mixin.SQuADv1[source]¶

Bases: claf.model.reading_comprehension.mixin.ReadingComprehension

Reading Comprehension Mixin Class: with SQuAD v1.1 evaluation

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

make_metrics(predictions)[source]¶

Make metrics with prediction dictionary

Args:
predictions: prediction dictionary consisting of
key: ‘id’ (question id)

value: (predict_text, pred_span_start, pred_span_end)
Returns:
metrics: metric dictionary consisting of
‘em’: exact_match (SQuAD v1.1 official evaluation)

‘f1’: f1 (SQuAD v1.1 official evaluation)

‘start_acc’: span_start accuracy

‘end_acc’: span_end accuracy

‘span_acc’: span accuracy (start and end)

class claf.model.reading_comprehension.mixin.SQuADv1ForBert[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1

Reading Comprehension Mixin Class: with SQuAD v1.1 evaluation

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

make_metrics(predictions)[source]¶: BERT predictions need to get nbest result

predict(output_dict, arguments, helper)[source]¶

Inference by raw_feature

Args:
output_dict: model’s output dictionary consisting of
data_idx: question id

best_span: calculate the span_start_logits and span_end_logits to what is the best span
arguments: arguments dictionary consisting of user_input helper: dictionary for helping get answer
Returns:
span: predict best_span

class claf.model.reading_comprehension.mixin.SQuADv2[source]¶

Bases: claf.model.reading_comprehension.mixin.ReadingComprehension

Reading Comprehension Mixin Class: with SQuAD v2.0 evaluation

Args:
token_embedder: ‘RCTokenEmbedder’, Used to embed the ‘context’ and ‘question’.

make_metrics(predictions)[source]¶

Make metrics with prediction dictionary

Args:
predictions: prediction dictionary consisting of
key: ‘id’ (question id)

value: consisting of dictionary
predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob
Returns:
metrics: metric dictionary consisting of
‘start_acc’: span_start accuracy

‘end_acc’: span_end accuracy

‘span_acc’: span accuracy (start and end)

‘em’: exact_match (SQuAD v2.0 official evaluation)

‘f1’: f1 (SQuAD v2.0 official evaluation)

‘HasAns_exact’: has answer exact_match

‘HasAns_f1’: has answer f1

‘NoAns_exact’: no answer exact_match

‘NoAns_f1’: no answer f1

‘best_exact’: best exact_match score with best_exact_thresh

‘best_exact_thresh’: best exact_match answerable threshold

‘best_f1’: best f1 score with best_f1_thresh

‘best_f1_thresh’: best f1 answerable threshold

class claf.model.reading_comprehension.qanet.EncoderBlock(model_dim=128, num_head=8, kernel_size=5, num_conv_block=4, dropout=0.1, layer_dropout=0.9)[source]¶

Bases: torch.nn.modules.module.Module

Encoder Block

[]: residual position_encoding -> [convolution-layer] x # -> [self-attention-layer] -> [feed-forward-layer]

convolution-layer: depthwise separable convolutions
self-attention-layer: multi-head attention
feed-forward-layer: pointwise convolution

Args:
model_dim: the number of model dimension num_heads: the number of head in multi-head attention kernel_size: convolution kernel size num_conv_block: the number of convolution block dropout: the dropout probability layer_dropout: the layer dropout probability

(cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )

forward(x, mask=None)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.qanet.QANet(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, kernel_size_in_embedding=7, num_head_in_embedding=8, num_conv_block_in_embedding=4, num_embedding_encoder_block=1, kernel_size_in_modeling=5, num_head_in_modeling=8, num_conv_block_in_modeling=2, num_modeling_encoder_block=7, dropout=0.1, layer_dropout=0.9)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in QANet:Combining Local Convolution with Global Self-Attention for Reading Comprehension (https://arxiv.org/abs/1804.09541)

Input Embedding Layer
Embedding Encoder Layer
Context-Query Attention Layer
Model Encoder Layer
Output Layer

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension
- Encoder Block Parameters (embedding, modeling) kernel_size: convolution kernel size in encoder block num_head: the number of multi-head attention’s head num_conv_block: the number of convolution block in encoder block
  
  [Layernorm -> Conv (residual)]
  
  num_encoder_block: the number of the encoder block
  
  [position_encoding -> [n repeat conv block] -> Layernorm -> Self-attention (residual)
  -> Layernorm -> Feedforward (residual)]
dropout: the dropout probability layer_dropout: the layer dropout probability

(cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

Module contents¶

class claf.model.reading_comprehension.BertForQA(token_makers, lang_code='en', pretrained_model_name=None, answer_maxlen=30)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1ForBert, claf.model.base.ModelWithoutTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] pretrained_model_name: the name of a pre-trained model answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen}

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.BiDAF(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BiDAF: Bidirectional Attention Flow for Machine Comprehension (https://arxiv.org/abs/1611.01603)

Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension contextual_rnn_num_layer: the number of recurrent layers (contextual) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.QANet(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, kernel_size_in_embedding=7, num_head_in_embedding=8, num_conv_block_in_embedding=4, num_embedding_encoder_block=1, kernel_size_in_modeling=5, num_head_in_modeling=8, num_conv_block_in_modeling=2, num_modeling_encoder_block=7, dropout=0.1, layer_dropout=0.9)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in QANet:Combining Local Convolution with Global Self-Attention for Reading Comprehension (https://arxiv.org/abs/1804.09541)

Input Embedding Layer
Embedding Encoder Layer
Context-Query Attention Layer
Model Encoder Layer
Output Layer

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension
- Encoder Block Parameters (embedding, modeling) kernel_size: convolution kernel size in encoder block num_head: the number of multi-head attention’s head num_conv_block: the number of convolution block in encoder block
  
  [Layernorm -> Conv (residual)]
  
  num_encoder_block: the number of the encoder block
  
  [position_encoding -> [n repeat conv block] -> Layernorm -> Self-attention (residual)
  -> Layernorm -> Feedforward (residual)]
dropout: the dropout probability layer_dropout: the layer dropout probability

(cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.DocQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension (https://arxiv.org/abs/1710.10723)

Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.DrQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, dropout=0.3)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Reading Wikipedia to Answer Open-Domain Questions (https://arxiv.org/abs/1704.00051)

Embedding + features
Align question embedding

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.RoBertaForQA(token_makers, lang_code='en', pretrained_model_name=None, answer_maxlen=30)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv1ForBert, claf.model.base.ModelWithoutTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] pretrained_model_name: the name of a pre-trained model answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen}

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.BiDAF_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Bidirectional Attention Flow for Machine Comprehension + Bias (No_Answer)

Embedding (Word + Char -> Contextual)
Attention Flow
Modeling (RNN)
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.DocQA_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]¶

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension + No_Asnwer (https://arxiv.org/abs/1710.10723)

Embedding (Word + Char -> Contextual)
Attention
Residual self-attention
Output

Args:
token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.
Kwargs:
lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]¶

Args:

features: feature dictionary like below.

{“feature_name1”: {

“token_name1”: tensor, “toekn_name2”: tensor},

“feature_name2”: …}
Kwargs:

label: label dictionary like below.

{“label_name1”: tensor,
“label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)
Returns: output_dict (dict) consisting of
- start_logits: representing unnormalized log probabilities of the span start position.
- end_logits: representing unnormalized log probabilities of the span end position.
- best_span: the string from the original passage that the model thinks is the best answer to the question.
- data_idx: the question id, mapping with answer
- loss: A scalar loss to be optimised.