Nordlys is based on a multitier architecture with three layers:
The core tier provides basic functionalities and is conneted to various the third-party tools. The functionalities include:
- Retrieval (based on Elasticsearch)
- Storage (based on MongoDB)
- Machine learning (based on scikit-learn)
- Evaluation (based on trec-eval)
Additionally, a separate data package is provided with functionality for loading and preprocessing standard data sets (DBpedia, Freebase, ClueWeb, etc.).
It is possible to connect additional external tools (or replace our default choices) by implementing standard interfaces of the respective core modules.
The core layer represents a versatile general-purpose modern IR library, which may also be accessed using command line tools.
The logic tier contains the main business logic, which is organized around five main modules:
- Entity provides access to the entity catalog (including knowledge bases and entity surface form dictionaries.
- Query provides the representation of search queries along with various preprocessing methods.
- Features is a collection of entity-related features, which may be used across different search tasks.
- Entity retrieval contains various entity ranking methods.
- Entity linking implements entity linking functionality.
The logic layer may not be accessed directly (i.e.,as a service or as a command line application).
The services tier provides end-user access to the toolkit’s functionality, throughout the command line, API, and web interface. Four main types of service is available: