nordlys.core.ml.instances module

Instances

Instances used for Machine learning algorithms.

  • Manages a set of Instance objects
  • Loads instance-data from JSON or TSV files
    • When using TSV, instance properties, target, and features are loaded from separate files
  • Generates a list of instances in JSON or RankLib format
Authors:Faegheh Hasibi, Krisztian Balog
class nordlys.core.ml.instances.Instances(instances=None)[source]

Bases: object

Class attributes:
instances: Instance objects stored in a dictionary indexed by instance id
Parameters:instances – instances in a list or dict - if list then list index is used as the instance ID - if dict then the key is used as the instance ID
add_features_from_tsv(tsv_file, features)[source]
add_instance(instance)[source]

Adds an Instance object to the list of instances.

Parameters:instance – Instance object
add_properties_from_tsv(tsv_file, properties)[source]
add_qids(prop)[source]

Generates (integer) q_id-s (for libsvm) based on a given (non-integer) property. It assigns a unique integer value to each different value for that property.

Parameters:prop – name of the property.
Returns:
add_target_from_tsv(tsv_file)[source]
append_instances(ins_list)[source]

Appends the list of Instances objects.

Parameters:ins_list – list of Instance objects
classmethod from_json(json_file)[source]

Loads instances from a JSON file.

Parameters:json_file – (string)

:return Instances object

get_all()[source]

Returns list of all instances.

get_all_ids()[source]

Returns list of all instance ids.

get_instance(instance_id)[source]

Returns an instance by instance id.

Parameters:instance_id – (string)
Returns:Instance object
group_by_property(property)[source]

Groups instances by a given property.

:param property :return a dictionary of instance ids {id:[ml.Instance, …], …}

to_json(json_file=None)[source]

Converts all instances to JSON and writes it to the file

Parameters:json_file – (string)
Returns:JSON dump of all instances.
to_libsvm(file_name=None, qid_prop=None)[source]

Converts all instances to the LibSVM format and writes them to the file. - Libsvm format:

<line> .=. <target> qid:<qid> <feature>:<value> … # <info> <target> .=. <float> <qid> .=. <positive integer> <feature> .=. <positive integer> <value> .=. <float> <info> .=. <string>
  • Example: 3 qid:1 1:1 2:1 3:0 4:0.2 5:0 # 1A
NOTES:
  • The property used for qid(qid_prop) should hold integers
  • For pointwise algorithms, we use instance id for qid
  • Lines in the RankLib input have to be sorted by increasing qid.
Parameters:
  • file_name – File to write libsvm format of instances.
  • qid_prop – property to be used as qid. If none,
to_str(file_name=None)[source]

Converts instances to string and write them to the given file. :param file_name :return: String format of instances

to_treceval(file_name, qid_prop='qid', docid_prop='en_id')[source]

Generates a TREC style run file - If there is an entity ranked more than once for the same query, the one with higher score is kept.

Parameters:
  • file_name – File to write TREC file
  • qid_prop – Name of instance property to be used as query ID (1st column)
  • docid_prop – Name of instance property to be used as document ID (3rd column)
nordlys.core.ml.instances.main(args)[source]