nordlys.core.eval.trec_run module

Trec run

Utility module for working with TREC runfiles.

Usage

Get statistics about a runfile
trec_run <run_file> -o stat
Filter runfile to contain only documents from a given set
trec_run <run_file> -o filter -d <doc_ids_file> -f <output_file> -n <num_results>
Authors:Krisztian Balog, Dario Garigliotti
class nordlys.core.eval.trec_run.TrecRun(file_name=None, normalize=False, remap_by_exp=False, run_id=None)[source]

Bases: object

Represents a TREC runfile.

Parameters:
  • file_name – name of the run file
  • normalize – whether retrieval scores are to be normalized for each query (default: False)
  • remap_by_exp – whether scores are to be converted from the log-domain by taking their exp (default: False)
filter(doc_ids_file, output_file, num_results=100)[source]

Filters runfile to include only selected docIDs and outputs the results to a file.

Parameters:
  • doc_ids_file – file with one doc_id per line
  • output_file – output file name
  • num_results – number of results per query
get_query_results(query_id)[source]

Returns the corresponding RetrievalResults object for a given query.

Parameters:query_id – queryID
Return type:nordlys.core.retrieval.retrieval_results.RetrievalResults
get_results()[source]

Returns all results.

Returns:a dict with queryIDs as keys and RetrievalResults object as values
load_file(file_name, remap_by_exp=False)[source]

Loads a TREC runfile.

Parameters:
  • file_name – name of the run file
  • remap_by_exp – whether scores are to be converted from the log-domain by taking their exp (default: False)
normalize()[source]

Normalizes the retrieval scores such that they sum up to one for each query.

print_stat()[source]

Prints simple statistics.

nordlys.core.eval.trec_run.arg_parser()[source]
nordlys.core.eval.trec_run.main(args)[source]