Nordlys is a general-purpose semantic search toolkit, which can be deployed on a local machine. There is built-in support for certain data collections, including DBpedia and Freebase. You may download these data sets and run a set of scripts for preprocessing and indexing them, as explained below. Alternatively, you may use the data dumps we made available; since those are huge, they are not on git but are available at a separate location (see below).
1. Obtain source code¶
You can clone the Nordlys repo using the following:
$ git clone https://github.com/iai-group/nordlys.git
2. Install prerequisites¶
Before deploying Nordlys, make sure the following ones are installed on your machine:
Then install Nodlys prerequisites using pip:
$ pip install -r requirements.txt
If you don’t have pip yet, install it using
$ easy_install pip
On Ubuntu, you might need to install lxml using a package manager
$ apt-get install python-lxml
3. Load data¶
Data are a crucial component of Nordlys. Note that you may need only a certain subset of the data, depending on the required functionality. See this page for a detailed description.
We use MongoDB and Elasticsearch for storing and indexing data. The figure below shows an overview of data sources and their dependencies.
All scripts below are to be run from the nordlys main directory.
3.1 Load data to MongoDB¶
To load the data to MongoDB, you need to run the following commands. Note that the first command is required for all Nordlys functionalities. Other commands are optional and you may run them if the mentioned functionality is needed.
||EL and EC|
3.2 Build Elastic indices¶
Run the following commands to build the indices for the mentioned functionalities.
||ER, EL, TTI|
||ER (only for ELR model)|
3.3 Dwnload the remaining data files¶
Run the following commands to download the data file needed for runing entity linking service