Nordlys is a general-purpose semantic search toolkit, which can be deployed on a local machine. There is built-in support for certain data collections, including DBpedia and Freebase. You may download these data sets and run a set of scripts for preprocessing and indexing them, as explained below. Alternatively, you may use the data dumps we made available; since those are huge, they are not on git but are available at a separate location (see below).

1. Obtain source code

You can clone the Nordlys repo using the following:

$ git clone

2. Install prerequisites

Before deploying Nordlys, make sure the following ones are installed on your machine:

Then install Nodlys prerequisites using pip:

$ pip install -r requirements.txt

If you don’t have pip yet, install it using

$ easy_install pip


On Ubuntu, you might need to install lxml using a package manager

$ apt-get install python-lxml

3. Load data

Data are a crucial component of Nordlys. Note that you may need only a certain subset of the data, depending on the required functionality. See this page for a detailed description.

We use MongoDB and Elasticsearch for storing and indexing data. The figure below shows an overview of data sources and their dependencies.

Nordlys data components


All scripts below are to be run from the nordlys main directory.

nordlys-v02$ ./scripts/

3.1 Load data to MongoDB

To load the data to MongoDB, you need to run the following commands. Note that the first command is required for all Nordlys functionalities. Other commands are optional and you may run them if the mentioned functionality is needed.

Command Required for
./scripts/ mongo_dbpedia-2015-10.tar.bz2 All

./scripts/ mongo_surface_forms_dbpedia.tar.bz2

./scripts/ mongo_surface_forms_facc.tar.bz2

./scripts/ mongo_fb2dbp-2015-10.tar.bz2

EL and EC
./scripts/ mongo_word2vec-googlenews.tar.bz2 TTI

3.2 Build Elastic indices

Run the following commands to build the indices for the mentioned functionalities.

Command Required for
./scripts/ dbpedia ER, EL, TTI
./scripts/ types TTI
./scripts/ dbpedia_uri ER (only for ELR model)