nordlys.core.retrieval.indexer_mongo module

Mongo Indexer

This class is a tool for creating an index from a Mongo collection.

To use this class, you need to implement callback_get_doc_content() function. See indexer_fsdm for an example usage of this class.

Author:Faegheh Hasibi
class nordlys.core.retrieval.indexer_mongo.IndexerMongo(index_name, mappings, collection, model='BM25')[source]

Bases: object

build(callback_get_doc_content, bulk_size=1000)[source]

Builds the DBpedia index from the mongo collection.

To speedup indexing, we index documents as a bulk. There is an optimum value for the bulk size; try to figure it out.

Parameters:
  • callback_get_doc_content – a function that get a documet from mongo and return the content for indexing
  • bulk_size – Number of documents to be added to the index as a bulk