claf.machine.components.retrieval package

Submodules

class claf.machine.components.retrieval.tfidf.TFIDF(texts, word_tokenizer, k=1)[source]

Bases: object

TF-IDF document retrieval model

  • Term Frequency

  • Inverse Document Frequency

  • log(tf + 1) * log((N - Nt + 0.5) / (Nt + 0.5))

  • Kwargs:

    k: the number of top k results

INDEX_FNAME = 'similarities.index'
TFIDF_FNAME = 'tfidf.model'
VOCAB_FNAME = 'vocab.txt'
get_closest(query)[source]
init()[source]
init_model()[source]
load(dir_path)[source]
parse(query, ngram=1)[source]
save(dir_path)[source]
text_to_tfidf(query)[source]

Create a tfidf-weighted word vector from query.

tfidf = log(tf + 1) * log((N - Nt + 0.5) / (Nt + 0.5))

Module contents