claf.machine.components.retrieval package¶
Submodules¶
-
class
claf.machine.components.retrieval.tfidf.TFIDF(texts, word_tokenizer, k=1)[source]¶ Bases:
objectTF-IDF document retrieval model
Term Frequency
Inverse Document Frequency
log(tf + 1) * log((N - Nt + 0.5) / (Nt + 0.5))
- Kwargs:
k: the number of top k results
-
INDEX_FNAME= 'similarities.index'¶
-
TFIDF_FNAME= 'tfidf.model'¶
-
VOCAB_FNAME= 'vocab.txt'¶