claf.machine.components.retrieval package¶
Submodules¶
-
class
claf.machine.components.retrieval.tfidf.
TFIDF
(texts, word_tokenizer, k=1)[source]¶ Bases:
object
TF-IDF document retrieval model
Term Frequency
Inverse Document Frequency
log(tf + 1) * log((N - Nt + 0.5) / (Nt + 0.5))
- Kwargs:
k: the number of top k results
-
INDEX_FNAME
= 'similarities.index'¶
-
TFIDF_FNAME
= 'tfidf.model'¶
-
VOCAB_FNAME
= 'vocab.txt'¶