Instances used for Machine learning algorithms.

  • Manages a set of Instance objects
  • Loads instance-data from JSON or TSV files
    • When using TSV, instance properties, target, and features are loaded from separate files
  • Generates a list of instances in JSON or RankLib format
Authors:Faegheh Hasibi, Krisztian Balog

Bases: object

Class attributes:
instances: Instance objects stored in a dictionary indexed by instance id
Parameters:instances – instances in a list or dict - if list then list index is used as the instance ID - if dict then the key is used as the instance ID
add_features_from_tsv(tsv_file, features)[source]

Adds an Instance object to the list of instances.

Parameters:instance – Instance object
add_properties_from_tsv(tsv_file, properties)[source]

Generates (integer) q_id-s (for libsvm) based on a given (non-integer) property. It assigns a unique integer value to each different value for that property.

Parameters:prop – name of the property.

Appends the list of Instances objects.

Parameters:ins_list – list of Instance objects
classmethod from_json(json_file)[source]

Loads instances from a JSON file.

Parameters:json_file – (string)

:return Instances object


Returns list of all instances.


Returns list of all instance ids.


Returns an instance by instance id.

Parameters:instance_id – (string)
Returns:Instance object

Groups instances by a given property.

:param property :return a dictionary of instance ids {id:[ml.Instance, …], …}


Converts all instances to JSON and writes it to the file

Parameters:json_file – (string)
Returns:JSON dump of all instances.
to_libsvm(file_name=None, qid_prop=None)[source]

Converts all instances to the LibSVM format and writes them to the file. - Libsvm format:

<line> .=. <target> qid:<qid> <feature>:<value> … # <info> <target> .=. <float> <qid> .=. <positive integer> <feature> .=. <positive integer> <value> .=. <float> <info> .=. <string>
  • Example: 3 qid:1 1:1 2:1 3:0 4:0.2 5:0 # 1A
  • The property used for qid(qid_prop) should hold integers
  • For pointwise algorithms, we use instance id for qid
  • Lines in the RankLib input have to be sorted by increasing qid.
  • file_name – File to write libsvm format of instances.
  • qid_prop – property to be used as qid. If none,

Converts instances to string and write them to the given file. :param file_name :return: String format of instances

to_treceval(file_name, qid_prop='qid', docid_prop='en_id')[source]

Generates a TREC style run file - If there is an entity ranked more than once for the same query, the one with higher score is kept.

  • file_name – File to write TREC file
  • qid_prop – Name of instance property to be used as query ID (1st column)
  • docid_prop – Name of instance property to be used as document ID (3rd column)[source]