nordlys.core.retrieval.scorer module¶
Scorer¶
Various retrieval models for scoring a individual document for a given query.
Authors: | Faegheh Hasibi, Krisztian Balog |
---|
-
class
nordlys.core.retrieval.scorer.
Scorer
(elastic, query, params)[source]¶ Bases:
object
Base scorer class.
-
SCORER_DEBUG
= 0¶
-
-
class
nordlys.core.retrieval.scorer.
ScorerLM
(elastic, query, params)[source]¶ Bases:
nordlys.core.retrieval.scorer.Scorer
Language Model (LM) scorer.
-
DIRICHLET
= 'dirichlet'¶
-
JM
= 'jm'¶
-
static
get_dirichlet_prob
(tf_t_d, len_d, tf_t_C, len_C, mu)[source]¶ Computes Dirichlet-smoothed probability. P(t|theta_d) = [tf(t, d) + mu P(t|C)] / [|d| + mu]
Parameters: Returns: Dirichlet-smoothed probability
-
static
get_jm_prob
(tf_t_d, len_d, tf_t_C, len_C, lambd)[source]¶ Computes JM-smoothed probability. p(t|theta_d) = [(1-lambda) tf(t, d)/|d|] + [lambda tf(t, C)/|C|]
Parameters: Returns: JM-smoothed probability
-
get_lm_term_prob
(doc_id, field, t, tf_t_d_f=None, tf_t_C_f=None)[source]¶ Returns term probability for a document and field.
Parameters: - doc_id – document ID
- field – field name
- t – term
Returns: P(t|d_f)
-
-
class
nordlys.core.retrieval.scorer.
ScorerMLM
(elastic, query, params)[source]¶ Bases:
nordlys.core.retrieval.scorer.ScorerLM
Mixture of Language Model (MLM) scorer.
- Implemented based on:
- Ogilvie, Callan. Combining document representations for known-item search. SIGIR 2003.
-
get_mlm_term_prob
(doc_id, t)[source]¶ Returns MLM probability for the given term and field-weights. p(t|theta_d) = sum(mu_f * p(t|theta_d_f))
Parameters: - lucene_doc_id – internal Lucene document ID
- t – term
Returns: P(t|theta_d)
-
class
nordlys.core.retrieval.scorer.
ScorerPRMS
(elastic, query, params)[source]¶ Bases:
nordlys.core.retrieval.scorer.ScorerLM
PRMS scorer.