nordlys.core.ml.instances module¶
Instances¶
Instances used for Machine learning algorithms.
- Manages a set of Instance objects
- Loads instance-data from JSON or TSV files
- When using TSV, instance properties, target, and features are loaded from separate files
- Generates a list of instances in JSON or RankLib format
Authors: | Faegheh Hasibi, Krisztian Balog |
---|
-
class
nordlys.core.ml.instances.
Instances
(instances=None)[source]¶ Bases:
object
- Class attributes:
- instances: Instance objects stored in a dictionary indexed by instance id
Parameters: instances – instances in a list or dict - if list then list index is used as the instance ID - if dict then the key is used as the instance ID -
add_instance
(instance)[source]¶ Adds an Instance object to the list of instances.
Parameters: instance – Instance object
-
add_qids
(prop)[source]¶ Generates (integer) q_id-s (for libsvm) based on a given (non-integer) property. It assigns a unique integer value to each different value for that property.
Parameters: prop – name of the property. Returns:
-
append_instances
(ins_list)[source]¶ Appends the list of Instances objects.
Parameters: ins_list – list of Instance objects
-
classmethod
from_json
(json_file)[source]¶ Loads instances from a JSON file.
Parameters: json_file – (string) :return Instances object
-
get_instance
(instance_id)[source]¶ Returns an instance by instance id.
Parameters: instance_id – (string) Returns: Instance object
-
group_by_property
(property)[source]¶ Groups instances by a given property.
:param property :return a dictionary of instance ids {id:[ml.Instance, …], …}
-
to_json
(json_file=None)[source]¶ Converts all instances to JSON and writes it to the file
Parameters: json_file – (string) Returns: JSON dump of all instances.
-
to_libsvm
(file_name=None, qid_prop=None)[source]¶ Converts all instances to the LibSVM format and writes them to the file. - Libsvm format:
<line> .=. <target> qid:<qid> <feature>:<value> … # <info> <target> .=. <float> <qid> .=. <positive integer> <feature> .=. <positive integer> <value> .=. <float> <info> .=. <string>- Example: 3 qid:1 1:1 2:1 3:0 4:0.2 5:0 # 1A
- NOTES:
- The property used for qid(qid_prop) should hold integers
- For pointwise algorithms, we use instance id for qid
- Lines in the RankLib input have to be sorted by increasing qid.
Parameters: - file_name – File to write libsvm format of instances.
- qid_prop – property to be used as qid. If none,
-
to_str
(file_name=None)[source]¶ Converts instances to string and write them to the given file. :param file_name :return: String format of instances
-
to_treceval
(file_name, qid_prop='qid', docid_prop='en_id')[source]¶ Generates a TREC style run file - If there is an entity ranked more than once for the same query, the one with higher score is kept.
Parameters: - file_name – File to write TREC file
- qid_prop – Name of instance property to be used as query ID (1st column)
- docid_prop – Name of instance property to be used as document ID (3rd column)