claf.model.reading_comprehension package

Submodules

class claf.model.reading_comprehension.bidaf.BiDAF(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BiDAF: Bidirectional Attention Flow for Machine Comprehension (https://arxiv.org/abs/1611.01603)

  • Embedding (Word + Char -> Contextual)

  • Attention Flow

  • Modeling (RNN)

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension contextual_rnn_num_layer: the number of recurrent layers (contextual) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.bidaf_no_answer.BiDAF_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Bidirectional Attention Flow for Machine Comprehension + Bias (No_Answer)

  • Embedding (Word + Char -> Contextual)

  • Attention Flow

  • Modeling (RNN)

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.docqa.DocQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension (https://arxiv.org/abs/1710.10723)

  • Embedding (Word + Char -> Contextual)

  • Attention

  • Residual self-attention

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.docqa.SelfAttention(rnn_dim, linear_dim, dropout=0.2, weight_init=True)[source]

Bases: torch.nn.modules.module.Module

Same bi-attention mechanism, only now between the passage and itself.

forward(context, context_mask)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.docqa_no_answer.DocQA_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension + No_Asnwer (https://arxiv.org/abs/1710.10723)

  • Embedding (Word + Char -> Contextual)

  • Attention

  • Residual self-attention

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.docqa_no_answer.NoAnswer(embed_dim, bias_hidden_dim)[source]

Bases: torch.nn.modules.module.Module

No-Answer Option

  • Args:

    embed_dim: the number of passage embedding dimension bias_hidden_dim: bias use two layer mlp, the number of hidden_size

forward(context_embed, span_start_logits, span_end_logits)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.docqa_no_answer.SelfAttention(rnn_dim, linear_dim, dropout=0.2, weight_init=True)[source]

Bases: torch.nn.modules.module.Module

Same bi-attention mechanism, only now between the passage and itself.

forward(context, context_mask)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.drqa.DrQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, dropout=0.3)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Reading Wikipedia to Answer Open-Domain Questions (https://arxiv.org/abs/1704.00051)

  • Embedding + features

  • Align question embedding

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.mixin.ReadingComprehension[source]

Bases: object

Reading Comprehension Mixin Class

  • Args:

    token_embedder: ‘RCTokenEmbedder’, Used to embed the ‘context’ and ‘question’.

get_best_span(span_start_logits, span_end_logits, answer_maxlen=None)[source]

Take argmax of constrained score_s * score_e.

  • Args:

    span_start_logits: independent start logits span_end_logits: independent end logits

  • Kwargs:

    answer_maxlen: max span length to consider (default is None -> All)

make_predictions(output_dict)[source]

Make predictions with model’s output_dict

  • Args:
    output_dict: model’s output dictionary consisting of
    • data_idx: question id

    • best_span: calculate the span_start_logits and span_end_logits to what is the best span

    • start_logits: span start logits

    • end_logits: span end logits

  • Returns:
    predictions: prediction dictionary consisting of
    • key: ‘id’ (question id)

    • value: consisting of dictionary

      predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob

predict(**kwargs)
print_examples(index, inputs, predictions)[source]

Print evaluation examples

  • Args:

    index: data index inputs: mini-batch inputs predictions: prediction dictionary consisting of

    • key: ‘id’ (question id)

    • value: consisting of dictionary

      predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob

  • Returns:

    print(Context, Question, Answers and Predict)

write_predictions(predictions, file_path=None, is_dict=True)[source]
class claf.model.reading_comprehension.mixin.SQuADv1[source]

Bases: claf.model.reading_comprehension.mixin.ReadingComprehension

Reading Comprehension Mixin Class

with SQuAD v1.1 evaluation

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

make_metrics(predictions)[source]

Make metrics with prediction dictionary

  • Args:
    predictions: prediction dictionary consisting of
    • key: ‘id’ (question id)

    • value: (predict_text, pred_span_start, pred_span_end)

  • Returns:
    metrics: metric dictionary consisting of
    • ‘em’: exact_match (SQuAD v1.1 official evaluation)

    • ‘f1’: f1 (SQuAD v1.1 official evaluation)

    • ‘start_acc’: span_start accuracy

    • ‘end_acc’: span_end accuracy

    • ‘span_acc’: span accuracy (start and end)

class claf.model.reading_comprehension.mixin.SQuADv1ForBert[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1

Reading Comprehension Mixin Class

with SQuAD v1.1 evaluation

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

make_metrics(predictions)[source]

BERT predictions need to get nbest result

predict(output_dict, arguments, helper)[source]

Inference by raw_feature

  • Args:
    output_dict: model’s output dictionary consisting of
    • data_idx: question id

    • best_span: calculate the span_start_logits and span_end_logits to what is the best span

    arguments: arguments dictionary consisting of user_input helper: dictionary for helping get answer

  • Returns:

    span: predict best_span

class claf.model.reading_comprehension.mixin.SQuADv2[source]

Bases: claf.model.reading_comprehension.mixin.ReadingComprehension

Reading Comprehension Mixin Class

with SQuAD v2.0 evaluation

  • Args:

    token_embedder: ‘RCTokenEmbedder’, Used to embed the ‘context’ and ‘question’.

make_metrics(predictions)[source]

Make metrics with prediction dictionary

  • Args:
    predictions: prediction dictionary consisting of
    • key: ‘id’ (question id)

    • value: consisting of dictionary

      predict_text, pred_span_start, pred_span_end, span_start_prob, span_end_prob

  • Returns:
    metrics: metric dictionary consisting of
    • ‘start_acc’: span_start accuracy

    • ‘end_acc’: span_end accuracy

    • ‘span_acc’: span accuracy (start and end)

    • ‘em’: exact_match (SQuAD v2.0 official evaluation)

    • ‘f1’: f1 (SQuAD v2.0 official evaluation)

    • ‘HasAns_exact’: has answer exact_match

    • ‘HasAns_f1’: has answer f1

    • ‘NoAns_exact’: no answer exact_match

    • ‘NoAns_f1’: no answer f1

    • ‘best_exact’: best exact_match score with best_exact_thresh

    • ‘best_exact_thresh’: best exact_match answerable threshold

    • ‘best_f1’: best f1 score with best_f1_thresh

    • ‘best_f1_thresh’: best f1 answerable threshold

class claf.model.reading_comprehension.qanet.EncoderBlock(model_dim=128, num_head=8, kernel_size=5, num_conv_block=4, dropout=0.1, layer_dropout=0.9)[source]

Bases: torch.nn.modules.module.Module

Encoder Block

[]: residual position_encoding -> [convolution-layer] x # -> [self-attention-layer] -> [feed-forward-layer]

  • convolution-layer: depthwise separable convolutions

  • self-attention-layer: multi-head attention

  • feed-forward-layer: pointwise convolution

  • Args:

    model_dim: the number of model dimension num_heads: the number of head in multi-head attention kernel_size: convolution kernel size num_conv_block: the number of convolution block dropout: the dropout probability layer_dropout: the layer dropout probability

    (cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )

forward(x, mask=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class claf.model.reading_comprehension.qanet.QANet(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, kernel_size_in_embedding=7, num_head_in_embedding=8, num_conv_block_in_embedding=4, num_embedding_encoder_block=1, kernel_size_in_modeling=5, num_head_in_modeling=8, num_conv_block_in_modeling=2, num_modeling_encoder_block=7, dropout=0.1, layer_dropout=0.9)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in QANet:Combining Local Convolution with Global Self-Attention for Reading Comprehension (https://arxiv.org/abs/1804.09541)

  • Input Embedding Layer

  • Embedding Encoder Layer

  • Context-Query Attention Layer

  • Model Encoder Layer

  • Output Layer

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension

    • Encoder Block Parameters (embedding, modeling) kernel_size: convolution kernel size in encoder block num_head: the number of multi-head attention’s head num_conv_block: the number of convolution block in encoder block

      [Layernorm -> Conv (residual)]

      num_encoder_block: the number of the encoder block
      [position_encoding -> [n repeat conv block] -> Layernorm -> Self-attention (residual)

      -> Layernorm -> Feedforward (residual)]

    dropout: the dropout probability layer_dropout: the layer dropout probability

    (cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

Module contents

class claf.model.reading_comprehension.BertForQA(token_makers, lang_code='en', pretrained_model_name=None, answer_maxlen=30)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1ForBert, claf.model.base.ModelWithoutTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] pretrained_model_name: the name of a pre-trained model answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen}

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.BiDAF(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BiDAF: Bidirectional Attention Flow for Machine Comprehension (https://arxiv.org/abs/1611.01603)

  • Embedding (Word + Char -> Contextual)

  • Attention Flow

  • Modeling (RNN)

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension contextual_rnn_num_layer: the number of recurrent layers (contextual) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.QANet(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, kernel_size_in_embedding=7, num_head_in_embedding=8, num_conv_block_in_embedding=4, num_embedding_encoder_block=1, kernel_size_in_modeling=5, num_head_in_modeling=8, num_conv_block_in_modeling=2, num_modeling_encoder_block=7, dropout=0.1, layer_dropout=0.9)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in QANet:Combining Local Convolution with Global Self-Attention for Reading Comprehension (https://arxiv.org/abs/1804.09541)

  • Input Embedding Layer

  • Embedding Encoder Layer

  • Context-Query Attention Layer

  • Model Encoder Layer

  • Output Layer

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension

    • Encoder Block Parameters (embedding, modeling) kernel_size: convolution kernel size in encoder block num_head: the number of multi-head attention’s head num_conv_block: the number of convolution block in encoder block

      [Layernorm -> Conv (residual)]

      num_encoder_block: the number of the encoder block
      [position_encoding -> [n repeat conv block] -> Layernorm -> Self-attention (residual)

      -> Layernorm -> Feedforward (residual)]

    dropout: the dropout probability layer_dropout: the layer dropout probability

    (cf. Deep Networks with Stochastic Depth(https://arxiv.org/abs/1603.09382) )

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.DocQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension (https://arxiv.org/abs/1710.10723)

  • Embedding (Word + Char -> Contextual)

  • Attention

  • Residual self-attention

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.DrQA(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=128, dropout=0.3)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1, claf.model.base.ModelWithTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in Reading Wikipedia to Answer Open-Domain Questions (https://arxiv.org/abs/1704.00051)

  • Embedding + features

  • Align question embedding

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.RoBertaForQA(token_makers, lang_code='en', pretrained_model_name=None, answer_maxlen=30)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv1ForBert, claf.model.base.ModelWithoutTokenEmbedder

Document Reader Model. Span Detector

Implementation of model presented in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/abs/1810.04805)

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] pretrained_model_name: the name of a pre-trained model answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen}

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.BiDAF_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=None, model_dim=100, contextual_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Bidirectional Attention Flow for Machine Comprehension + Bias (No_Answer)

  • Embedding (Word + Char -> Contextual)

  • Attention Flow

  • Modeling (RNN)

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} model_dim: the number of model dimension dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.

class claf.model.reading_comprehension.DocQA_No_Answer(token_embedder, lang_code='en', aligned_query_embedding=False, answer_maxlen=17, rnn_dim=100, linear_dim=200, preprocess_rnn_num_layer=1, modeling_rnn_num_layer=2, predict_rnn_num_layer=1, dropout=0.2, weight_init=True)[source]

Bases: claf.model.reading_comprehension.mixin.SQuADv2, claf.model.base.ModelWithTokenEmbedder

Question Answering Model. Span Detector, No Answer

Implementation of model presented in Simple and Effective Multi-Paragraph Reading Comprehension + No_Asnwer (https://arxiv.org/abs/1710.10723)

  • Embedding (Word + Char -> Contextual)

  • Attention

  • Residual self-attention

  • Output

  • Args:

    token_embedder: ‘QATokenEmbedder’, Used to embed the ‘context’ and ‘question’.

  • Kwargs:

    lang_code: Dataset language code [en|ko] aligned_query_embedding: f_align(p_i) = sum(a_ij, E(qj), where the attention score a_ij

    captures the similarity between pi and each question words q_j. these features add soft alignments between similar but non-identical words (e.g., car and vehicle) it only apply to ‘context_embed’.

    answer_maxlen: the most probable answer span of length less than or equal to {answer_maxlen} rnn_dim: the number of RNN cell dimension linear_dim: the number of attention linear dimension preprocess_rnn_num_layer: the number of recurrent layers (preprocess) modeling_rnn_num_layer: the number of recurrent layers (modeling) predict_rnn_num_layer: the number of recurrent layers (predict) dropout: the dropout probability

forward(features, labels=None)[source]
  • Args:
    features: feature dictionary like below.
    {“feature_name1”: {

    “token_name1”: tensor, “toekn_name2”: tensor},

    “feature_name2”: …}

  • Kwargs:
    label: label dictionary like below.
    {“label_name1”: tensor,

    “label_name2”: tensor} Do not calculate loss when there is no label. (inference/predict mode)

  • Returns: output_dict (dict) consisting of
    • start_logits: representing unnormalized log probabilities of the span start position.

    • end_logits: representing unnormalized log probabilities of the span end position.

    • best_span: the string from the original passage that the model thinks is the best answer to the question.

    • data_idx: the question id, mapping with answer

    • loss: A scalar loss to be optimised.