weka package¶
Subpackages¶
- weka.core package
- weka.core.capabilities module
Capabilities
Capabilities.attribute_capabilities()
Capabilities.capabilities()
Capabilities.class_capabilities()
Capabilities.dependencies()
Capabilities.disable()
Capabilities.disable_all()
Capabilities.disable_all_attribute_dependencies()
Capabilities.disable_all_attributes()
Capabilities.disable_all_class_dependencies()
Capabilities.disable_all_classes()
Capabilities.disable_dependency()
Capabilities.enable()
Capabilities.enable_all()
Capabilities.enable_all_attribute_dependencies()
Capabilities.enable_all_attributes()
Capabilities.enable_all_class_dependencies()
Capabilities.enable_all_classes()
Capabilities.enable_dependency()
Capabilities.for_instances()
Capabilities.handles()
Capabilities.has_dependencies()
Capabilities.has_dependency()
Capabilities.min_instances
Capabilities.other_capabilities()
Capabilities.owner
Capabilities.supports()
Capabilities.supports_maybe()
Capabilities.test_attribute()
Capabilities.test_instances()
Capability
- weka.core.classes module
AbstractParameter
Date
Enum
Environment
JavaArray
JavaArrayIterator
JavaObject
ListParameter
MathParameter
Option
OptionHandler
Random
Range
SelectedTag
SetupGenerator
SingleIndex
Tag
Tags
backquote()
complete_classname()
deepcopy()
from_byte_array()
from_commandline()
get_classname()
get_enum()
get_jclass()
get_static_field()
help_for()
is_array()
is_instance_of()
join_options()
list_property_names()
load_suggestions()
main()
new_array()
new_instance()
quote()
serialization_read()
serialization_read_all()
serialization_write()
serialization_write_all()
split_commandline()
split_options()
suggest_package()
suggestions
to_byte_array()
to_commandline()
unbackquote()
unquote()
- weka.core.converters module
- weka.core.database module
- weka.core.dataset module
Attribute
Attribute.add_relation()
Attribute.add_string_value()
Attribute.copy()
Attribute.create_date()
Attribute.create_nominal()
Attribute.create_numeric()
Attribute.create_relational()
Attribute.create_string()
Attribute.date_format
Attribute.equals()
Attribute.equals_msg()
Attribute.index
Attribute.index_of()
Attribute.is_averagable
Attribute.is_date
Attribute.is_in_range()
Attribute.is_nominal
Attribute.is_numeric
Attribute.is_relation_valued
Attribute.is_string
Attribute.lower_numeric_bound
Attribute.name
Attribute.num_values
Attribute.ordering
Attribute.parse_date()
Attribute.type
Attribute.type_str()
Attribute.upper_numeric_bound
Attribute.value()
Attribute.values
Attribute.weight
AttributeIterator
AttributeStats
Instance
Instance.class_attribute
Instance.class_index
Instance.create_instance()
Instance.create_sparse_instance()
Instance.dataset
Instance.get_relational_value()
Instance.get_string_value()
Instance.get_value()
Instance.has_class()
Instance.has_missing()
Instance.is_missing()
Instance.missing_value()
Instance.num_attributes
Instance.num_classes
Instance.set_missing()
Instance.set_string_value()
Instance.set_value()
Instance.to_numpy()
Instance.values
Instance.weight
InstanceIterator
InstanceValueIterator
Instances
Instances.add_instance()
Instances.append_instances()
Instances.attribute()
Instances.attribute_by_name()
Instances.attribute_names()
Instances.attribute_stats()
Instances.attributes()
Instances.class_attribute
Instances.class_index
Instances.class_is_first()
Instances.class_is_last()
Instances.compactify()
Instances.copy_instances()
Instances.copy_structure()
Instances.create_instances()
Instances.cv_splits()
Instances.delete()
Instances.delete_attribute()
Instances.delete_attribute_type()
Instances.delete_first_attribute()
Instances.delete_last_attribute()
Instances.delete_with_missing()
Instances.equal_headers()
Instances.get_instance()
Instances.has_class()
Instances.insert_attribute()
Instances.merge_instances()
Instances.no_class()
Instances.num_attributes
Instances.num_instances
Instances.randomize()
Instances.relationname
Instances.set_instance()
Instances.sort()
Instances.stratify()
Instances.subset()
Instances.summary()
Instances.template_instances()
Instances.test_cv()
Instances.to_numpy()
Instances.train_cv()
Instances.train_test_split()
Instances.values()
Stats
check_col_names_unique()
create_instances_from_lists()
create_instances_from_matrices()
missing_value()
- weka.core.distances module
- weka.core.jvm module
- weka.core.packages module
Dependency
LATEST
Package
PackageConstraint
all_package()
all_packages()
available_package()
available_packages()
establish_cache()
install_missing_package()
install_missing_packages()
install_package()
install_packages()
installed_package()
installed_packages()
is_installed()
is_official_package()
main()
refresh_cache()
suggest_package()
sys_main()
uninstall_package()
uninstall_packages()
- weka.core.serialization module
- weka.core.stemmers module
- weka.core.stopwords module
- weka.core.tokenizers module
- weka.core.typeconv module
float_to_jfloat()
from_jobject_array()
jdouble_array_to_ndarray()
jdouble_matrix_to_ndarray()
jdouble_to_float()
jenumeration_to_list()
jint_array_to_ndarray()
jstring_array_to_list()
jstring_list_to_string_list()
string_list_to_jarray()
string_list_to_jlist()
to_jdouble_array()
to_jint_array()
to_jobject_array()
to_string()
- weka.core.utils module
- weka.core.version module
- Module contents
- weka.core.capabilities module
- weka.flow package
- weka.flow.container module
- weka.flow.conversion module
- weka.flow.sink module
- weka.flow.source module
- weka.flow.transformer module
- Module contents
- weka.plot package
weka.associations module¶
- class weka.associations.AssociationRule(jobject)¶
Bases:
JavaObject
Wrapper for weka.associations.AssociationRule class.
- property consequence¶
Get the consequence.
- Returns:
the consequence, list of Item objects
- Return type:
list
- property consequence_support¶
Get the support for the consequence.
- Returns:
the support
- Return type:
int
- property metric_names¶
Returns the metric names for the rule.
- Returns:
the metric names
- Return type:
list
- metric_value(name)¶
Returns the named metric value for the rule.
- Parameters:
name (str) – the name of the metric
- Returns:
the metric value
- Return type:
float
- property metric_values¶
Returns the metric values for the rule.
- Returns:
the metric values
- Return type:
ndarray
- property premise¶
Get the premise.
- Returns:
the premise, list of Item objects
- Return type:
list
- property premise_support¶
Get the support for the premise.
- Returns:
the support
- Return type:
int
- property primary_metric_name¶
Returns the primary metric name for the rule.
- Returns:
the metric name
- Return type:
str
- property primary_metric_value¶
Returns the primary metric value for the rule.
- Returns:
the metric value
- Return type:
float
- to_dict()¶
Builds a dictionary with the properties of the AssociationRule object.
- Returns:
the AssociationRule dictionary
- Return type:
dict
- property total_support¶
Get the total support.
- Returns:
the support
- Return type:
int
- property total_transactions¶
Get the total transactions.
- Returns:
the transactions
- Return type:
int
- class weka.associations.AssociationRules(jobject)¶
Bases:
JavaObject
Wrapper for weka.associations.AssociationRules class.
- property producer¶
Returns a string describing the producer that generated these rules.
- Returns:
the producer
- Return type:
str
- to_dict()¶
Returns a list of association rules in dict format
- Returns:
the association rules
- Return type:
list
- class weka.associations.AssociationRulesIterator(rules)¶
Bases:
object
Iterator for weka.associations.AssociationRules class.
- class weka.associations.Associator(classname=None, jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for associators.
- association_rules()¶
Returns association rules that were generated. Only if implements AssociationRulesProducer.
- Returns:
the association rules that were generated
- Return type:
- build_associations(data)¶
Builds the associator with the data.
- Parameters:
data (Instances) – the data to train the associator with
- can_produce_rules()¶
Checks whether association rules can be generated.
- Returns:
whether scheme implements AssociationRulesProducer interface and association rules can be generated
- Return type:
bool
- property capabilities¶
Returns the capabilities of the associator.
- Returns:
the capabilities
- Return type:
- property header¶
Returns the header of the training data.
- Returns:
the structure of the training data, None if not available
- Return type:
- classmethod make_copy(associator)¶
Creates a copy of the associator.
- Parameters:
associator (Associator) – the associator to copy
- Returns:
the copy of the associator
- Return type:
- property rule_metric_names¶
Returns the rule metric names of the association rules. Only if implements AssociationRulesProducer.
- Returns:
the metric names
- Return type:
list
- class weka.associations.Item(jobject)¶
Bases:
JavaObject
Wrapper for weka.associations.Item class.
- property comparison¶
Returns the comparison operator as string.
- Returns:
the comparison iterator
- Return type:
str
- decrease_frequency(frequency=None)¶
Decreases the frequency.
- Parameters:
frequency (int) – the frequency to decrease by, 1 if None
- property frequency¶
Returns the frequency.
- Returns:
the frequency
- Return type:
int
- increase_frequency(frequency=None)¶
Increases the frequency.
- Parameters:
frequency (int) – the frequency to increase by, 1 if None
- property item_value¶
Returns the item value as string.
- Returns:
the item value
- Return type:
str
- weka.associations.main(args=None)¶
Runs an associator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
- Parameters:
args (list) – the command-line arguments to use, uses sys.argv if None
- weka.associations.sys_main()¶
Runs the main function using the system cli arguments, and returns a system error code.
- Returns:
0 for success, 1 for failure.
- Return type:
int
weka.attribute_selection module¶
- class weka.attribute_selection.ASEvaluation(classname='weka.attributeSelection.CfsSubsetEval', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for attribute selection evaluation algorithm.
- build_evaluator(data)¶
Builds the evaluator with the data.
- Parameters:
data (Instances) – the data to use
- property capabilities¶
Returns the capabilities of the classifier.
- Returns:
the capabilities
- Return type:
- convert_instance(inst)¶
Transforms an instance in the format of the original data to the transformed space.
- property header¶
Returns the header of the training data.
- Returns:
the structure of the training data, None if not available
- Return type:
- post_process(indices)¶
Post-processes the evaluator with the selected attribute indices.
- Parameters:
indices (ndarray) – the attribute indices list to use
- Returns:
the processed indices
- Return type:
ndarray
- transformed_data(data)¶
Transform the supplied data set (assumed to be the same format as the training data).
- transformed_header()¶
Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through transformed_data(). Returns None if not a weka.attributeSelection.AttributeTransformer
- Returns:
the header
- Return type:
- class weka.attribute_selection.ASSearch(classname='weka.attributeSelection.BestFirst', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for attribute selection search algorithm.
- property header¶
Returns the header of the training data.
- Returns:
the structure of the training data, None if not available
- Return type:
- search(evaluation, data)¶
Performs the search and returns the indices of the selected attributes.
- Parameters:
evaluation (ASEvaluation) – the evaluation algorithm to use
data (Instances) – the data to use
- Returns:
the selected attributes (0-based indices)
- Return type:
ndarray
- class weka.attribute_selection.AttributeSelection¶
Bases:
JavaObject
Performs attribute selection using search and evaluation algorithms.
- classmethod attribute_selection(evaluator, args)¶
Performs attribute selection using the given attribute evaluator and options.
- Parameters:
evaluator (ASEvaluation) – the evaluator to use
args (list) – the command-line args for the attribute selection
- Returns:
the results string
- Return type:
str
- crossvalidation(crossvalidation)¶
Sets whether to perform cross-validation.
- Parameters:
crossvalidation (bool) – whether to perform cross-validation
- property cv_results¶
Generates a results string from the last cross-validation attribute selection.
- Returns:
the results string
- Return type:
str
- evaluator(evaluator)¶
Sets the evaluator to use.
- Parameters:
evaluator (ASEvaluation) – the evaluator to use.
- folds(folds)¶
Sets the number of folds to use for cross-validation.
- Parameters:
folds (int) – the number of folds
- property number_attributes_selected¶
Returns the number of attributes that were selected.
- Returns:
the number of attributes
- Return type:
int
- property rank_results¶
Returns the results from the cross-validation for rankers.
Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.
- Returns:
the dictionary of results (mean and stdev for rank and merit)
- Return type:
dict
- property ranked_attributes¶
Returns the matrix of ranked attributes from the last run.
- Returns:
the Numpy matrix
- Return type:
ndarray
- ranking(ranking)¶
Sets whether to perform a ranking, if possible.
- Parameters:
ranking (bool) – whether to perform a ranking
- reduce_dimensionality(data)¶
Reduces the dimensionality of the provided Instance or Instances object.
- property results_string¶
Generates a results string from the last attribute selection.
- Returns:
the results string
- Return type:
str
- search(search)¶
Sets the search algorithm to use.
- Parameters:
search (ASSearch) – the search algorithm
- seed(seed)¶
Sets the seed for cross-validation.
- Parameters:
seed (int) – the seed value
- select_attributes(instances)¶
Performs attribute selection on the given dataset.
- Parameters:
instances (Instances) – the data to process
- select_attributes_cv_split(instances)¶
Performs attribute selection on the given cross-validation split.
- Parameters:
instances (Instances) – the data to process
- property selected_attributes¶
Returns the selected attributes from the last run.
- Returns:
the Numpy array of 0-based indices
- Return type:
ndarray
- property subset_results¶
Returns the results from the cross-validation subsets, i.e., how often an attribute was selected.
Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.
- Returns:
the list of results (double)
- Return type:
list
- weka.attribute_selection.main(args=None)¶
Runs attribute selection from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
- Parameters:
args (list) – the command-line arguments to use, uses sys.argv if None
- weka.attribute_selection.sys_main()¶
Runs the main function using the system cli arguments, and returns a system error code.
- Returns:
0 for success, 1 for failure.
- Return type:
int
weka.classifiers module¶
- class weka.classifiers.AttributeSelectedClassifier(jobject=None, options=None)¶
Bases:
SingleClassifierEnhancer
Wrapper class for the AttributeSelectedClassifier.
- property evaluator¶
Returns the evaluator.
- Returns:
the evaluator in use
- Return type:
- class weka.classifiers.Classifier(classname='weka.classifiers.rules.ZeroR', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for classifiers.
- additional_measure(measure)¶
Returns the specified additional measure if implementing weka.core.AdditionalMeasureProducer, otherwise None.
- Parameters:
measure (str) – the measure to retrieve
- Returns:
the additional measure
- Return type:
str
- property additional_measures¶
Returns the list of additional measures if implementing weka.core.AdditionalMeasureProducer, otherwise None.
- Returns:
the additional measures
- Return type:
str
- property batch_size¶
Returns the batch size, in case this classifier is a batch predictor.
- Returns:
the batch size, None if not a batch predictor
- Return type:
str
- build_classifier(data)¶
Builds the classifier with the data.
- Parameters:
data (Instances) – the data to train the classifier with
- property capabilities¶
Returns the capabilities of the classifier.
- Returns:
the capabilities
- Return type:
- classify_instance(inst)¶
Peforms a prediction.
- Parameters:
inst (Instance) – the Instance to get a prediction for
- Returns:
the classification (either regression value or 0-based label index)
- Return type:
float
- classmethod deserialize(ser_file)¶
Deserializes a classifier from a file.
- Parameters:
ser_file (str) – the model file to deserialize
- Returns:
model and, if available, the dataset header
- Return type:
tuple
- distribution_for_instance(inst)¶
Peforms a prediction, returning the class distribution.
- Parameters:
inst (Instance) – the Instance to get the class distribution for
- Returns:
the class distribution array
- Return type:
ndarray
- distributions_for_instances(data)¶
Peforms predictions, returning the class distributions.
- Parameters:
data (Instances) – the Instances to get the class distributions for
- Returns:
the class distribution matrix, None if not a batch predictor
- Return type:
ndarray
- property graph¶
Returns the graph if classifier implements weka.core.Drawable, otherwise None.
- Returns:
the generated graph string
- Return type:
str
- property graph_type¶
Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.
- Returns:
the type
- Return type:
int
- has_efficient_batch_prediction()¶
Returns whether the classifier implements a more efficient batch prediction.
- Returns:
True if a more efficient batch prediction is implemented, always False if not batch predictor
- Return type:
bool
- property header¶
Returns the header of the training data.
- Returns:
the structure of the training data, None if not available
- Return type:
- classmethod make_copy(classifier)¶
Creates a copy of the classifier.
- Parameters:
classifier (Classifier) – the classifier to copy
- Returns:
the copy of the classifier
- Return type:
- serialize(ser_file, header=None)¶
Serializes the classifier to the specified file.
- Parameters:
ser_file (str) – the file to save the model to
header (Instances) – the (optional) dataset header to store alongside; recommended
- to_source(classname)¶
Returns the model as Java source code if the classifier implements weka.classifiers.Sourcable.
- Parameters:
classname (str) – the classname for the generated Java code
- Returns:
the model as source code string
- Return type:
str
- class weka.classifiers.CostMatrix(matrx=None, num_classes=None)¶
Bases:
JavaObject
Class for storing and manipulating a misclassification cost matrix. The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i. Cost values can be fixed or computed on a per-instance basis (cost sensitive evaluation only) from the value of an attribute or an expression involving attribute(s).
- apply_cost_matrix(data, rnd)¶
Applies the cost matrix to the data.
- expected_costs(class_probs, inst=None)¶
Calculates the expected misclassification cost for each possible class value, given class probability estimates.
- Parameters:
class_probs (ndarray) – the class probabilities
- Returns:
the calculated costs
- Return type:
ndarray
- get_cell(row, col)¶
Returns the JPype object at the specified location.
- Parameters:
row (int) – the 0-based index of the row
col (int) – the 0-based index of the column
- Returns:
the object in that cell
- Return type:
JPype object
- get_element(row, col, inst=None)¶
Returns the value at the specified location.
- Parameters:
row (int) – the 0-based index of the row
col (int) – the 0-based index of the column
inst (Instance) – the Instace
- Returns:
the value in that cell
- Return type:
float
- get_max_cost(class_value, inst=None)¶
Gets the maximum cost for a particular class value.
- Parameters:
class_value (int) – the class value to get the maximum cost for
inst (Instance) – the Instance
- Returns:
the cost
- Return type:
float
- initialize()¶
Initializes the matrix.
- normalize()¶
Normalizes the matrix.
- property num_columns¶
Returns the number of columns.
- Returns:
the number of columns
- Return type:
int
- property num_rows¶
Returns the number of rows.
- Returns:
the number of rows
- Return type:
int
- classmethod parse_matlab(matlab)¶
Parses the costmatrix definition in matlab format and returns a matrix.
- Parameters:
matlab (str) – the matlab matrix string, eg [1 2; 3 4].
- Returns:
the generated matrix
- Return type:
- set_cell(row, col, obj)¶
Sets the JPype object at the specified location. Automatically unwraps JavaObject.
- Parameters:
row (int) – the 0-based index of the row
col (int) – the 0-based index of the column
obj (object) – the object for that cell
- set_element(row, col, value)¶
Sets the float value at the specified location.
- Parameters:
row (int) – the 0-based index of the row
col (int) – the 0-based index of the column
value (float) – the float value for that cell
- property size¶
Returns the number of rows/columns.
- Returns:
the number of rows/columns
- Return type:
int
- to_matlab()¶
Returns the matrix in Matlab format.
- Returns:
the matrix as Matlab formatted string
- Return type:
str
- class weka.classifiers.Evaluation(data, cost_matrix=None)¶
Bases:
JavaObject
Evaluation class for classifiers.
- area_under_prc(class_index)¶
Returns the area under precision recall curve.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the area
- Return type:
float
- area_under_roc(class_index)¶
Returns the area under receiver operators characteristics curve.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the area
- Return type:
float
- property avg_cost¶
Returns the average cost.
- Returns:
the cost
- Return type:
float
- class_details(title=None)¶
Generates the class details.
- Parameters:
title (str) – optional title
- Returns:
the details
- Return type:
str
- property class_priors¶
Returns the class priors.
- Returns:
the priors
- Return type:
ndarray
- property confusion_matrix¶
Returns the confusion matrix.
- Returns:
the matrix
- Return type:
ndarray
- property correct¶
Returns the correct count (nominal classes).
- Returns:
the count
- Return type:
float
- property correlation_coefficient¶
Returns the correlation coefficient (numeric classes).
- Returns:
the coefficient
- Return type:
float
- property coverage_of_test_cases_by_predicted_regions¶
Returns the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.
- Returns:
the coverage
- Return type:
float
- crossvalidate_model(classifier, data, num_folds, rnd, output=None)¶
Crossvalidates the model using the specified data, number of folds and random number generator wrapper.
- Parameters:
classifier (Classifier) – the classifier to cross-validate
data (Instances) – the data to evaluate on
num_folds (int) – the number of folds
rnd (Random) – the random number generator to use
output (PredictionOutput) – the output generator to use
- cumulative_margin_distribution()¶
Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.
- Returns:
the cumulative margin distribution
- Return type:
str
- property discard_predictions¶
Returns whether to discard predictions (saves memory).
- Returns:
True if to discard
- Return type:
bool
- property error_rate¶
Returns the error rate (numeric classes).
- Returns:
the rate
- Return type:
float
- classmethod evaluate_model(classifier, args)¶
Evaluates the classifier with the given options.
- Parameters:
classifier (Classifier) – the classifier instance to use
args (list) – the command-line arguments to use
- Returns:
the evaluation string
- Return type:
str
- evaluate_train_test_split(classifier, data, percentage, rnd=None, output=None)¶
Splits the data into train and test, builds the classifier with the training data and evaluates it against the test set.
- Parameters:
classifier (Classifier) – the classifier to cross-validate
data (Instances) – the data to evaluate on
percentage (double) – the percentage split to use (amount to use for training)
rnd (Random) – the random number generator to use, if None the order gets preserved
output (PredictionOutput) – the output generator to use
- f_measure(class_index)¶
Returns the f measure.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the measure
- Return type:
float
- false_negative_rate(class_index)¶
Returns the false negative rate.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the rate
- Return type:
float
- false_positive_rate(class_index)¶
Returns the false positive rate.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the rate
- Return type:
float
- property incorrect¶
Returns the incorrect count (nominal classes).
- Returns:
the count
- Return type:
float
- property kappa¶
Returns kappa.
- Returns:
kappa
- Return type:
float
- property kb_information¶
Returns KB information.
- Returns:
the information
- Return type:
float
- property kb_mean_information¶
Returns KB mean information.
- Returns:
the information
- Return type:
float
- property kb_relative_information¶
Returns KB relative information.
- Returns:
the information
- Return type:
float
- matrix(title=None)¶
Generates the confusion matrix.
- Parameters:
title (str) – optional title
- Returns:
the matrix
- Return type:
str
- matthews_correlation_coefficient(class_index)¶
Returns the Matthews correlation coefficient (nominal classes).
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the coefficient
- Return type:
float
- property mean_absolute_error¶
Returns the mean absolute error.
- Returns:
the error
- Return type:
float
- property mean_prior_absolute_error¶
Returns the mean prior absolute error.
- Returns:
the error
- Return type:
float
- num_false_negatives(class_index)¶
Returns the number of false negatives.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the count
- Return type:
float
- num_false_positives(class_index)¶
Returns the number of false positives.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the count
- Return type:
float
- property num_instances¶
Returns the number of instances that had a known class value.
- Returns:
the number of instances
- Return type:
float
- num_true_negatives(class_index)¶
Returns the number of true negatives.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the count
- Return type:
float
- num_true_positives(class_index)¶
Returns the number of true positives.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the count
- Return type:
float
- property percent_correct¶
Returns the percent correct (nominal classes).
- Returns:
the percentage
- Return type:
float
- property percent_incorrect¶
Returns the percent incorrect (nominal classes).
- Returns:
the percentage
- Return type:
float
- property percent_unclassified¶
Returns the percent unclassified.
- Returns:
the percentage
- Return type:
float
- precision(class_index)¶
Returns the precision.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the precision
- Return type:
float
- property predictions¶
Returns the predictions.
- Returns:
the predictions. None if not available
- Return type:
list
- recall(class_index)¶
Returns the recall.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the recall
- Return type:
float
- property relative_absolute_error¶
Returns the relative absolute error.
- Returns:
the error
- Return type:
float
- property root_mean_prior_squared_error¶
Returns the root mean prior squared error.
- Returns:
the error
- Return type:
float
- property root_mean_squared_error¶
Returns the root mean squared error.
- Returns:
the error
- Return type:
float
- property root_relative_squared_error¶
Returns the root relative squared error.
- Returns:
the error
- Return type:
float
- property sf_entropy_gain¶
Returns the total SF, which is the null model entropy minus the scheme entropy.
- Returns:
the gain
- Return type:
float
- property sf_mean_entropy_gain¶
Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.
- Returns:
the gain
- Return type:
float
- property sf_mean_prior_entropy¶
Returns the entropy per instance for the null model.
- Returns:
the entropy
- Return type:
float
- property sf_mean_scheme_entropy¶
Returns the entropy per instance for the scheme.
- Returns:
the entropy
- Return type:
float
- property sf_prior_entropy¶
Returns the total entropy for the null model.
- Returns:
the entropy
- Return type:
float
- property sf_scheme_entropy¶
Returns the total entropy for the scheme.
- Returns:
the entropy
- Return type:
float
- property size_of_predicted_regions¶
Returns the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.
:return:the size of the regions :rtype: float
- summary(title=None, complexity=False)¶
Generates a summary.
- Parameters:
title (str) – optional title
complexity (bool) – whether to print the complexity information as well
- Returns:
the summary
- Return type:
str
- test_model(classifier, data, output=None)¶
Evaluates the built model using the specified test data and returns the classifications.
- Parameters:
classifier (Classifier) – the trained classifier to evaluate
data (Instances) – the data to evaluate on
output (PredictionOutput) – the output generator to use
- Returns:
the classifications
- Return type:
ndarray
- test_model_once(classifier, inst, store=False)¶
Evaluates the built model using the specified test instance and returns the classification.
- Parameters:
classifier (Classifier) – the classifier to cross-validate
inst (Instance) – the Instance to evaluate on
store (bool) – whether to store the predictions (some statistics in class_details() like AUC require that)
- Returns:
the classification
- Return type:
float
- property total_cost¶
Returns the total cost.
- Returns:
the cost
- Return type:
float
- true_negative_rate(class_index)¶
Returns the true negative rate.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the rate
- Return type:
float
- true_positive_rate(class_index)¶
Returns the true positive rate.
- Parameters:
class_index (int) – the 0-based index of the class label
- Returns:
the rate
- Return type:
float
- property unclassified¶
Returns the unclassified count.
- Returns:
the count
- Return type:
float
- property unweighted_macro_f_measure¶
Returns the unweighted macro-averaged F-measure.
- Returns:
the measure
- Return type:
float
- property unweighted_micro_f_measure¶
Returns the unweighted micro-averaged F-measure.
- Returns:
the measure
- Return type:
float
- property weighted_area_under_prc¶
Returns the weighted area under precision recall curve.
- Returns:
the weighted area
- Return type:
float
- property weighted_area_under_roc¶
Returns the weighted area under receiver operator characteristic curve.
- Returns:
the weighted area
- Return type:
float
- property weighted_f_measure¶
Returns the weighted f measure.
- Returns:
the measure
- Return type:
float
- property weighted_false_negative_rate¶
Returns the weighted false negative rate.
- Returns:
the rate
- Return type:
float
- property weighted_false_positive_rate¶
Returns the weighted false positive rate.
- Returns:
the rate
- Return type:
float
- property weighted_matthews_correlation¶
Returns the weighted Matthews correlation (nominal classes).
- Returns:
the correlation
- Return type:
float
- property weighted_precision¶
Returns the weighted precision.
- Returns:
the precision
- Return type:
float
- property weighted_recall¶
Returns the weighted recall.
- Returns:
the recall
- Return type:
float
- property weighted_true_negative_rate¶
Returns the weighted true negative rate.
- Returns:
the rate
- Return type:
float
- property weighted_true_positive_rate¶
Returns the weighted true positive rate.
- Returns:
the rate
- Return type:
float
- class weka.classifiers.FilteredClassifier(jobject=None, options=None)¶
Bases:
SingleClassifierEnhancer
Wrapper class for the filtered classifier.
- check_for_modified_class_attribute(check)¶
Sets whether to check for class attribute modifications.
- Parameters:
check (bool) – True if checking for modifications
- property filter¶
Returns the filter.
- Returns:
the filter in use
- Return type:
- class weka.classifiers.GridSearch(jobject=None, options=None)¶
Bases:
SingleClassifierEnhancer
Wrapper class for the GridSearch meta-classifier.
- property best¶
Returns the best classifier setup found during the th search.
- Returns:
the best classifier setup
- Return type:
- property evaluation¶
Returns the currently set statistic used for evaluation.
- Returns:
the statistic
- Return type:
- property x¶
Returns a dictionary with all the current values for the X of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str
- Returns:
the dictionary with the parameters
- Return type:
dict
- property y¶
Returns a dictionary with all the current values for the Y of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str
- Returns:
the dictionary with the parameters
- Return type:
dict
- class weka.classifiers.Kernel(classname=None, jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for kernels.
- build_kernel(data)¶
Builds the classifier with the data.
- Parameters:
data (Instances) – the data to train the classifier with
- capabilities()¶
Returns the capabilities of the classifier.
- Returns:
the capabilities
- Return type:
- property checks_turned_off¶
Returns whether checks are turned off.
- Returns:
True if checks turned off
- Return type:
bool
- clean()¶
Frees the memory used by the kernel.
- eval(id1, id2, inst1)¶
Computes the result of the kernel function for two instances. If id1 == -1, eval use inst1 instead of an instance in the dataset.
- Parameters:
id1 (int) – the index of the first instance in the dataset
id2 (int) – the index of the second instance in the dataset
inst1 (Instance) – the instance corresponding to id1 (used if id1 == -1)
- class weka.classifiers.KernelClassifier(classname=None, jobject=None, options=None)¶
Bases:
Classifier
Wrapper class for classifiers that have a kernel property, like SMO.
- class weka.classifiers.MultiSearch(jobject=None, options=None)¶
Bases:
SingleClassifierEnhancer
Wrapper class for the MultiSearch meta-classifier. NB: ‘multi-search-weka-package’ must be installed (https://github.com/fracpete/multisearch-weka-package), version 2016.1.15 or later.
- property best¶
Returns the best classifier setup found during the th search.
- Returns:
the best classifier setup
- Return type:
- property evaluation¶
Returns the currently set statistic used for evaluation.
- Returns:
the statistic
- Return type:
- property parameters¶
Returns the list of currently set search parameters.
- Returns:
the list of AbstractSearchParameter objects
- Return type:
list
- class weka.classifiers.MultipleClassifiersCombiner(classname=None, jobject=None, options=None)¶
Bases:
Classifier
Wrapper class for classifiers that use a multiple base classifiers.
- append(classifier)¶
Appends the classifier to the current list of classifiers.
- Parameters:
classifier (Classifier) – the classifier to add
- property classifiers¶
Returns the list of base classifiers.
- Returns:
the classifier list
- Return type:
list
- clear()¶
Removes all classifiers.
- class weka.classifiers.NominalPrediction(jobject)¶
Bases:
Prediction
Wrapper class for a nominal prediction.
- property distribution¶
Returns the class distribution.
- Returns:
the class distribution list
- Return type:
ndarray
- property margin¶
Returns the margin.
- Returns:
the margin
- Return type:
float
- class weka.classifiers.NumericPrediction(jobject)¶
Bases:
Prediction
Wrapper class for a numeric prediction.
- property error¶
Returns the error.
- Returns:
the error
- Return type:
float
- property prediction_intervals¶
Returns the prediction intervals.
- Returns:
the intervals
- Return type:
ndarray
- class weka.classifiers.Prediction(jobject)¶
Bases:
JavaObject
Wrapper class for a prediction.
- property actual¶
Returns the actual value.
- Returns:
the actual value (internal representation)
- Return type:
float
- property predicted¶
Returns the predicted value.
- Returns:
the predicted value (internal representation)
- Return type:
float
- property weight¶
Returns the weight.
- Returns:
the weight of the Instance that was used
- Return type:
float
- class weka.classifiers.PredictionOutput(classname='weka.classifiers.evaluation.output.prediction.PlainText', jobject=None, options=None)¶
Bases:
OptionHandler
For collecting predictions and generating output from. Must be derived from weka.classifiers.evaluation.output.prediction.AbstractOutput
- buffer_content()¶
Returns the content of the buffer as string.
- Returns:
The buffer content
- Return type:
str
- print_all(cls, data)¶
Prints the header, classifications and footer to the buffer.
- Parameters:
cls (Classifier) – the classifier
data (Instances) – the test data
- print_classification(cls, inst, index)¶
Prints the classification to the buffer.
- Parameters:
cls (Classifier) – the classifier
inst (Instance) – the test instance
index (int) – the 0-based index of the test instance
- print_classifications(cls, data)¶
Prints the classifications to the buffer.
- Parameters:
cls (Classifier) – the classifier
data (Instances) – the test data
Prints the footer to the buffer.
- print_header()¶
Prints the header to the buffer.
- class weka.classifiers.SingleClassifierEnhancer(classname=None, jobject=None, options=None)¶
Bases:
Classifier
Wrapper class for classifiers that use a single base classifier.
- property classifier¶
Returns the base classifier.
;return: the base classifier :rtype: Classifier
- weka.classifiers.main(args=None)¶
Runs a classifier from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
- Parameters:
args (list) – the command-line arguments to use, uses sys.argv if None
- weka.classifiers.predictions_to_instances(data, preds)¶
Turns the predictions turned into an Instances object.
- weka.classifiers.sys_main()¶
Runs the main function using the system cli arguments, and returns a system error code.
- Returns:
0 for success, 1 for failure.
- Return type:
int
weka.clusterers module¶
- class weka.clusterers.ClusterEvaluation¶
Bases:
JavaObject
Evaluation class for clusterers.
- property classes_to_clusters¶
Return the array (ordered by cluster number) of minimum error class to cluster mappings.
- Returns:
the mappings
- Return type:
ndarray
- property cluster_assignments¶
Return an array of cluster assignments corresponding to the most recent set of instances clustered.
- Returns:
the cluster assignments
- Return type:
ndarray
- property cluster_results¶
The cluster results as string.
- Returns:
the results string
- Return type:
str
- classmethod crossvalidate_model(clusterer, data, num_folds, rnd)¶
Cross-validates the clusterer and returns the loglikelihood.
- classmethod evaluate_clusterer(clusterer, args)¶
Evaluates the clusterer with the given options.
- Parameters:
clusterer (Clusterer) – the clusterer instance to evaluate
args (list) – the command-line arguments
- Returns:
the evaluation result
- Return type:
str
- property log_likelihood¶
Returns the log likelihood.
- Returns:
the log likelihood
- Return type:
float
- property num_clusters¶
Returns the number of clusters.
- Returns:
the number of clusters
- Return type:
int
- class weka.clusterers.Clusterer(classname='weka.clusterers.SimpleKMeans', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for clusterers.
- build_clusterer(data)¶
Builds the clusterer with the data.
- Parameters:
data (Instances) – the data to use for training the clusterer
- property capabilities¶
Returns the capabilities of the clusterer.
- Returns:
the capabilities
- Return type:
- cluster_instance(inst)¶
Peforms a prediction.
- Parameters:
inst (Instance) – the instance to determine the cluster for
- Returns:
the clustering result
- Return type:
float
- classmethod deserialize(ser_file)¶
Deserializes a clusterer from a file.
- Parameters:
ser_file (str) – the model file to deserialize
- Returns:
model and, if available, the dataset header
- Return type:
tuple
- distribution_for_instance(inst)¶
Peforms a prediction, returning the cluster distribution.
- Parameters:
inst (Instance) – the Instance to get the cluster distribution for
- Returns:
the cluster distribution
- Return type:
np.ndarray
- property graph¶
Returns the graph if classifier implements weka.core.Drawable, otherwise None.
- Returns:
the graph or None if not available
- Return type:
str
- property graph_type¶
Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.
- Returns:
the type
- Return type:
int
- property header¶
Returns the header of the training data.
- Returns:
the structure of the training data, None if not available
- Return type:
- classmethod make_copy(clusterer)¶
Creates a copy of the clusterer.
- property number_of_clusters¶
Returns the number of clusters found.
- Returns:
the number fo clusters
- Return type:
int
- serialize(ser_file, header=None)¶
Serializes the clusterer to the specified file.
- Parameters:
ser_file (str) – the file to save the model to
header (Instances) – the (optional) dataset header to store alongside; recommended
- update_clusterer(inst)¶
Updates the clusterer with the instance.
- Parameters:
inst (Instance) – the Instance to update the clusterer with
- update_finished()¶
Signals the clusterer that updating with new data has finished.
- class weka.clusterers.FilteredClusterer(jobject=None, options=None)¶
Bases:
SingleClustererEnhancer
Wrapper class for the filtered clusterer.
- property filter¶
Returns the filter.
- Returns:
the filter
- Return type:
- class weka.clusterers.SingleClustererEnhancer(classname=None, jobject=None, options=None)¶
Bases:
Clusterer
Wrapper class for clusterers that use a single base clusterer.
- weka.clusterers.avg_silhouette_coefficient(clusterer, dist_func, data)¶
Computes the average silhouette coefficient for a clusterer. Based on Eibe Frank’s Groovy code: https://weka.8497.n7.nabble.com/Silhouette-Measures-and-Dunn-Index-DI-in-Weka-td44072.html
- Parameters:
clusterer (Clusterer) – the trained clusterer model to evaluate
dist_func (DistanceFunction) – the distance function to use; if Euclidean, make sure that normalization is turned off
data (Instances) – the standardized data
- Returns:
the average silhouette coefficient
- Return type:
float
- weka.clusterers.main(args=None)¶
Runs a clusterer from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
- Parameters:
args (list) – the command-line arguments to use, uses sys.argv if None
- weka.clusterers.sys_main()¶
Runs the main function using the system cli arguments, and returns a system error code.
- Returns:
0 for success, 1 for failure.
- Return type:
int
weka.datagenerators module¶
- class weka.datagenerators.DataGenerator(classname='weka.datagenerators.classifiers.classification.Agrawal', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for datagenerators.
- generate_examples()¶
Returns complete dataset.
- Returns:
the generated dataset
- Return type:
- generate_finish()¶
Returns a “finish” string.
- Returns:
a finish comment
- Return type:
str
- generate_start()¶
Returns a “start” string.
- Returns:
the start comment
- Return type:
str
- classmethod make_copy(generator)¶
Creates a copy of the generator.
- Parameters:
generator (DataGenerator) – the generator to copy
- Returns:
the copy of the generator
- Return type:
- classmethod make_data(generator, args)¶
Generates data using the generator and commandline arguments.
- Parameters:
generator (DataGenerator) – the generator instance to use
args (list) – the command-line arguments
- property num_examples_act¶
Returns a actual number of examples to generate.
- Returns:
the number of examples
- Return type:
int
- property single_mode_flag¶
Returns whether data is generated row by row (True) or in one go (False).
- Returns:
whether incremental
- Return type:
bool
- weka.datagenerators.main(args=None)¶
Runs a datagenerator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
- Parameters:
args (list) – the command-line arguments to use, uses sys.argv if None
- weka.datagenerators.sys_main()¶
Runs the main function using the system cli arguments, and returns a system error code.
- Returns:
0 for success, 1 for failure.
- Return type:
int
weka.experiments module¶
- class weka.experiments.Experiment(classname='weka.experiment.Experiment', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for an experiment.
- class weka.experiments.ResultMatrix(classname='weka.experiment.ResultMatrixPlainText', jobject=None, options=None)¶
Bases:
OptionHandler
For generating results from an Experiment run.
- average(col)¶
Returns the average mean at this location (if valid location).
- Parameters:
col (int) – the 0-based column index
- Returns:
the mean
- Return type:
float
- property columns¶
Returns the column count.
- Returns:
the count
- Return type:
int
- get_col_name(index)¶
Returns the column name.
- Parameters:
index (int) – the 0-based row index
- Returns:
the column name, None if invalid index
- Return type:
str
- get_mean(col, row)¶
Returns the mean at this location (if valid location).
- Parameters:
col (int) – the 0-based column index
row (int) – the 0-based row index
- Returns:
the mean
- Return type:
float
- get_row_name(index)¶
Returns the row name.
- Parameters:
index (int) – the 0-based row index
- Returns:
the row name, None if invalid index
- Return type:
str
- get_stdev(col, row)¶
Returns the standard deviation at this location (if valid location).
- Parameters:
col (int) – the 0-based column index
row (int) – the 0-based row index
- Returns:
the standard deviation
- Return type:
float
- hide_col(index)¶
Hides the column.
- Parameters:
index (int) – the 0-based column index
- hide_row(index)¶
Hides the row.
- Parameters:
index (int) – the 0-based row index
Returns whether the column is hidden.
- Parameters:
index (int) – the 0-based column index
- Returns:
true if hidden
- Return type:
bool
Returns whether the row is hidden.
- Parameters:
index (int) – the 0-based row index
- Returns:
true if hidden
- Return type:
bool
- property rows¶
Returns the row count.
- Returns:
the count
- Return type:
int
- set_col_name(index, name)¶
Sets the column name.
- Parameters:
index (int) – the 0-based row index
name (str) – the name of the column
- set_mean(col, row, mean)¶
Sets the mean at this location (if valid location).
- Parameters:
col (int) – the 0-based column index
row (int) – the 0-based row index
mean (float) – the mean to set
- set_row_name(index, name)¶
Sets the row name.
- Parameters:
index (int) – the 0-based row index
name (str) – the name of the row
- set_stdev(col, row, stdev)¶
Sets the standard deviation at this location (if valid location).
- Parameters:
col (int) – the 0-based column index
row (int) – the 0-based row index
stdev (float) – the standard deviation to set
- show_col(index)¶
Shows the column.
- Parameters:
index (int) – the 0-based column index
- show_row(index)¶
Shows the row.
- Parameters:
index (int) – the 0-based row index
- to_string_header()¶
Returns the header of the matrix as a string.
- Returns:
the header
- Return type:
str
- to_string_key()¶
Returns a key for all the col names, for better readability if the names got cut off.
- Returns:
the key
- Return type:
str
- to_string_matrix()¶
Returns the matrix as a string.
- Returns:
the generated output
- Return type:
str
- to_string_ranking()¶
Returns the ranking in a string representation.
- Returns:
the ranking
- Return type:
str
- to_string_summary()¶
returns the summary as string.
- Returns:
the summary
- Return type:
str
- class weka.experiments.SimpleCrossValidationExperiment(datasets, classifiers, classification=True, runs=10, folds=10, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)¶
Bases:
SimpleExperiment
Performs a simple cross-validation experiment. Can output the results either in ARFF or CSV.
- configure_resultproducer()¶
Configures and returns the ResultProducer and PropertyPath as tuple.
- Returns:
producer and property path
- Return type:
tuple
- class weka.experiments.SimpleExperiment(datasets, classifiers, jobject=None, classification=True, runs=10, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)¶
Bases:
OptionHandler
Ancestor for simple experiments.
See following URL for how to use the Experiment API: http://weka.wikispaces.com/Using+the+Experiment+API
- configure_resultproducer()¶
Configures and returns the ResultProducer and PropertyPath as tuple.
- Returns:
producer and property path
- Return type:
tuple
- configure_splitevaluator()¶
Configures and returns the SplitEvaluator and Classifier instance as tuple.
- Returns:
evaluator and classifier
- Return type:
tuple
- experiment()¶
Returns the internal experiment, if set up, otherwise None.
- Returns:
the internal experiment
- Return type:
- classmethod load(filename)¶
Loads the experiment from disk.
- Parameters:
filename (str) – the filename of the experiment to load
- Returns:
the experiment
- Return type:
- run()¶
Executes the experiment.
- classmethod save(filename, experiment)¶
Saves the experiment to disk.
- Parameters:
filename (str) – the filename to save the experiment to
experiment (Experiment) – the Experiment to save
- setup()¶
Initializes the experiment.
- class weka.experiments.SimpleRandomSplitExperiment(datasets, classifiers, classification=True, runs=10, percentage=66.6, preserve_order=False, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)¶
Bases:
SimpleExperiment
Performs a simple random split experiment. Can output the results either in ARFF or CSV.
- configure_resultproducer()¶
Configures and returns the ResultProducer and PropertyPath as tuple.
- Returns:
producer and property path
- Return type:
tuple
- class weka.experiments.Tester(classname='weka.experiment.PairedCorrectedTTester', jobject=None, options=None, swap_rows_and_cols=False)¶
Bases:
OptionHandler
For generating statistical results from an experiment.
- property dataset_columns¶
Returns the list of column names that identify uniquely a dataset.
- Returns:
the list of attributes names
- Return type:
list
- property fold_column¶
Returns the column name that holds the Fold number.
- Returns:
the attribute name
- Return type:
str
- header(comparison_column)¶
Creates a “header” string describing the current resultsets.
- Parameters:
comparison_column (int) – the index of the column to compare against
- Returns:
the header
- Return type:
str
- init_columns()¶
Sets the column indices based on the supplied names if necessary.
- property instances¶
Returns the data used in the analysis.
- Returns:
the data in use
- Return type:
- multi_resultset_full(base_resultset, comparison_column)¶
Creates a comparison table where a base resultset is compared to the other resultsets.
- Parameters:
base_resultset (int) – the 0-based index of the base resultset (eg classifier to compare against)
comparison_column (int) – the 0-based index of the column to compare against
- Returns:
the comparison
- Return type:
str
- multi_resultset_ranking(comparison_column)¶
Creates a ranking.
- Parameters:
comparison_column (int) – the 0-based index of the column to compare against
- Returns:
the ranking
- Return type:
str
- multi_resultset_summary(comparison_column)¶
Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.
- Parameters:
comparison_column (int) – the 0-based index of the column to compare against
- Returns:
the summary
- Return type:
str
- property result_columns¶
Returns the list of column names that identify uniquely a result (eg classifier + options + ID).
- Returns:
the list of attribute names
- Return type:
list
- property resultmatrix¶
Returns the ResultMatrix instance in use.
- Returns:
the matrix in use
- Return type:
- property run_column¶
Returns the column name that holds the Run number.
- Returns:
the attribute name
- Return type:
str
- property swap_rows_and_cols¶
Returns whether to swap rows/cols.
- Returns:
whether to swap
- Return type:
bool
weka.filters module¶
- class weka.filters.AttributeSelection(jobject=None, options=None)¶
Bases:
Filter
Wrapper class for weka.filters.supervised.attribute.AttributeSelection.
- property evaluator¶
Returns the evaluator.
- Returns:
the evaluator in use
- Return type:
- class weka.filters.Filter(classname='weka.filters.AllFilter', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for filters.
- batch_finished()¶
Signals the filter that the batch of data has finished.
- Returns:
True if instances can be collected from the output
- Return type:
bool
- capabilities()¶
Returns the capabilities of the filter.
- Returns:
the capabilities
- Return type:
- classmethod deserialize(ser_file)¶
Deserializes a filter from a file.
- Parameters:
ser_file (str) – the file to deserialize from
- Returns:
model
- Return type:
- filter(data)¶
Filters the dataset(s). When providing a list, this can be used to create compatible train/test sets, since the filter only gets initialized with the first dataset and all subsequent datasets get transformed using the same setup.
NB: inputformat(Instances) must have been called beforehand.
- input(inst)¶
Inputs the Instance.
- Parameters:
inst (Instance) – the instance to filter
- Returns:
True if filtered can be collected from output
- Return type:
bool
- classmethod make_copy(flter)¶
Creates a copy of the filter.
- output()¶
Outputs the filtered Instance.
- Returns:
the filtered instance
- Return type:
an Instance object
- serialize(ser_file)¶
Serializes the filter to the specified file.
- Parameters:
ser_file (str) – the file to save the filter to
- to_source(classname, data)¶
Returns the model as Java source code if the classifier implements weka.filters.Sourcable.
- Parameters:
classname (str) – the classname for the generated Java code
data (Instances) – the dataset used for initializing the filter
- Returns:
the model as source code string
- Return type:
str
- class weka.filters.MultiFilter(jobject=None, options=None)¶
Bases:
Filter
Wrapper class for weka.filters.MultiFilter.
- append(filter)¶
Appends the filter to the current list of filters.
- Parameters:
filter (Filter) – the filter to add
- clear()¶
Removes all filters.
- property filters¶
Returns the list of base filters.
- Returns:
the filter list
- Return type:
list
- class weka.filters.StringToWordVector(jobject=None, options=None)¶
Bases:
Filter
Wrapper class for weka.filters.unsupervised.attribute.StringToWordVector.
- property stopwords¶
Returns the stopwords handler.
- Returns:
the stopwords handler
- Return type:
- weka.filters.main(args=None)¶
Runs a filter from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
- Parameters:
args (list) – the command-line arguments to use, uses sys.argv if None
- weka.filters.sys_main()¶
Runs the main function using the system cli arguments, and returns a system error code.
- Returns:
0 for success, 1 for failure.
- Return type:
int
weka.timeseries module¶
- class weka.timeseries.ConfidenceIntervalForecaster(jobject)¶
Bases:
JavaObject
Wrapper class for ConfidenceIntervalForecaster objects.
- property calculate_conf_intervals_for_forecasts¶
Returns the number of steps for which confidence intervals will be computed.
- Returns:
the steps
- Return type:
int
- property confidence_level¶
Returns the confidence level in use for computing confidence intervals.
- Returns:
the level
- Return type:
float
- property is_producing_confidence_intervals¶
Returns true if this forecaster is computing confidence limits for some or all of its future forecasts (i.e. getCalculateConfIntervalsForForecasts() > 0).
- Returns:
true if confidence intervals are produced
- Return type:
bool
- class weka.timeseries.CustomPeriodicTest(jobject=None, test=None)¶
Bases:
JavaObject
Class that evaluates a supplied date against user-specified date constant fields. Fields that can be tested against include year, month, week of year, week of month, day of year, day of month, day of week, hour of day, minute of hour and second. Wildcard “*” matches any value for a particular field. Each CustomPeriodicTest is made up of one or two test parts. If the first test part’s operator is “=”, then no second part is necessary. Otherwise the first test part may use > or >= operators and the second test part < or <= operators. Taken together, the two parts define an interval. An optional label may be associated with the interval.
- evaluate(date)¶
Evaluate the supplied date with respect to this custom periodic test interval.
- Parameters:
date (Date) – the date to test
- Returns:
true if the date lies within the interval.
- Return type:
bool
- property label¶
Returns the label.
- Returns:
the label
- Return type:
str
- test(test)¶
Sets the test as string.
- Parameters:
test (str) – the test to use
- class weka.timeseries.ErrorModule(jobject)¶
Bases:
TSEvalModule
Wrapper for ErrorModule objects.
- counts_for_targets()¶
Returns the number of predicted, actual pairs for each target. Only entries that are non-missing for both actual and predicted contribute to the overall count.
- Returns:
the number of predicted, actual pairs for each target.
- Return type:
ndarray
- errors_for_target(target)¶
Returns the list of the errors for the supplied target.
- Parameters:
target (str) – the target to get the errors for
- Returns:
the errors
- Return type:
list
- predictions_for_all_targets()¶
Returns the list of predictions for all targets.
- Returns:
list of list of NumericPrediction
- Return type:
list
- predictions_for_target(target)¶
Returns the list of predictions for the target.
- Parameters:
target (str) – the target to get the predictions for
- Returns:
list of NumericPrediction
- Return type:
list
- class weka.timeseries.IncrementallyPrimeable(jobject)¶
Bases:
JavaObject
Wrapper class for IncrementallyPrimeable objects.
- class weka.timeseries.OverlayForecaster(jobject)¶
Bases:
JavaObject
Wrapper class for OverlayForecaster objects.
- forecast_with_overlays(steps, overlays)¶
Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated. Also assumes that the forecaster has been told which attributes are to be considered “overlay” attributes in the data. Overlay data is data that the forecaster will be provided with when making a forecast into the future - i.e. it will be given the values of these attributes for future instances. The overlay data provided to this method should have the same structure as the original data used to train the forecaster - i.e. all original fields should be present, including the targets and time stamp field (if supplied). The values of targets will of course be missing (‘?’) since we want to forecast those. The time stamp values (if a time stamp is in use) may be provided, in which case the forecaster will use the time stamp values in the overlay instances. If the time stamp values are missing, then date arithmetic (for date time stamps) will be used to advance the time value beyond the last seen training value; similarly, for artificial time stamps or non-date time stamps, the computed time delta will be used to increment beyond the last seen training value.
The number of instances in the overlay data should typically match the number of steps that have been requested for forecasting. If these differ, then overlay.numInstances() will be the number of steps forecasted.
- Parameters:
steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.
overlays (Instances) – the overlay data to use
- Returns:
a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)
- Return type:
list
- property is_using_overlay_data¶
Returns true if overlay data has been used to train this forecaster, and thus is expected to be supplied for future time steps when making a forecast.
- Returns:
- property overlay_fields¶
Returns the overlay fields as string.
- Returns:
the overlay fields
- Return type:
str
- class weka.timeseries.PeriodicityHandler(jobject)¶
Bases:
JavaObject
Helper class to manage time stamp manipulation with respect to various periodicities. Has a routine to remap the time stamp, which is useful for date time stamps. Since dates are just manipulated internally as the number of milliseconds elapsed since the epoch, and any global trend modelling in regression functions results in enormous coefficients for this variable - remapping to a more reasonable scale prevents this. It also makes it easier to handle the case where there are time periods that shouldn’t be considered as a time unit increment, e.g. weekends and public holidays for financial trading data. These “holes” in the data can be accomodated by accumulating a negative offset for the remapped date when a particular data/time occurs in a user-specified “skip” list.
- property delta_time¶
Returns the delta time.
- Returns:
the delta time
- Return type:
float
- class weka.timeseries.TSEvalModule(jobject)¶
Bases:
JavaObject
Wrapper for TSEvalModule objects.
- calculate_measure()¶
Calculate the measure that this module represents.
- Returns:
the value of the measure for this module for each of the target(s).
- Return type:
ndarray
- property definition¶
Returns the description.
- property description¶
Returns the description.
- property eval_name¶
Returns the name.
- evaluate_for_instance(pred, inst)¶
Evaluate the given forecast(s) with respect to the given test instance. Targets with missing values are ignored.
- Parameters:
pred (NumericPrediction) – the numeric prediction
inst (Instance) – the instance
- classmethod module(name)¶
Returns the module with the specified name.
- Parameters:
name (str) – the name of the module to return
- Returns:
the TSEvalModule object
- Return type:
- classmethod module_list()¶
Returns list of available modules.
- Returns:
the list of modules (TSEvalModule objects)
- Return type:
list
- reset()¶
Resets the module.
- property summary¶
Returns the description.
- property target_fields¶
Returns the list of target fields.
- Returns:
the list of target fields
- Return type:
list
- class weka.timeseries.TSEvaluation(train, test_split_size=0.3, test=None)¶
Bases:
JavaObject
Evaluation class for timeseries forecasters.
- evaluate(forecaster, build_model=True)¶
Evaluates the forecaster.
- Parameters:
forecaster (TSForecaster) – the forecaster to evaluate
build_model (bool) – whether to build the model as well
- classmethod evaluate_forecaster(forecaster, args)¶
Evaluates the forecaster with the given options.
- Parameters:
forecaster (TSForecaster) – the forecaster instance to use
args (list) – the command-line arguments to use
- property evaluate_on_test_data¶
Returns whether to evaluate on the test data.
- Returns:
whether to evaluate
- Return type:
bool
- property evaluate_on_training_data¶
Returns whether to evaluate on the training data.
- Returns:
whether to evaluate
- Return type:
bool
- property evaluation_modules¶
Returns the list of evaluation modules in use.
- Returns:
list of TSEvalModule object
- Return type:
list
- property forecast_future¶
Returns whether we should generate a future forecast beyond the end of the training and/or test data.
- Returns:
whether to prime
- Return type:
bool
- property horizon¶
Returns the number of steps to predict into the future.
- Returns:
the number of steps
- Return type:
int
- predictions_for_test_data(step_number)¶
Predictions for all targets for the specified step number on the test data.
- Parameters:
step_number (int) – number of the step into the future to return predictions for
- predictions_for_training_data(step_number)¶
Predictions for all targets for the specified step number on the training data.
- Parameters:
step_number (int) – number of the step into the future to return predictions for
- property prime_for_test_data_with_test_data¶
Returns whether evaluation for test data should begin by priming with the first x test data instances and then forecasting from step x + 1. This is the only option if there is no training data and a model has been deserialized from disk. If we have training data, and it occurs immediately before the test data in time, then we can prime with the last x instances from the training data.
- Returns:
whether to prime
- Return type:
bool
- property prime_window_size¶
Returns the size of the priming window, ie the number of historical instances to present before making a forecast.
- Returns:
the size
- Return type:
int
- print_future_forecast_on_test_data(forecaster)¶
Print the forecasted values (for all targets) beyond the end of the test data.
- Parameters:
forecaster (TSForecaster) – the forecaster to use
- Returns:
the forecasted values
- Return type:
str
- print_future_forecast_on_training_data(forecaster)¶
Print the forecasted values (for all targets) beyond the end of the training data.
- Parameters:
forecaster (TSForecaster) – the forecaster to use
- Returns:
the forecasted values
- Return type:
str
- print_predictions_for_test_data(title, target_name, step_ahead, instance_number_offset=0)¶
Print the predictions for a given target at a given step-ahead level on the test data.
- Parameters:
title (str) – the title for the output
target_name (str) – the name of the target to print predictions for
step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions
instance_number_offset (int) – the offset from the start of the test data from which to print actual and predicted values
- Returns:
the predicted/actual values
- Return type:
str
- print_predictions_for_training_data(title, target_name, step_ahead, instance_number_offset=0)¶
Print the predictions for a given target at a given step-ahead level on the training data.
- Parameters:
title (str) – the title for the output
target_name (str) – the name of the target to print predictions for
step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions
instance_number_offset (int) – the offset from the start of the training data from which to print actual and predicted values
- Returns:
the predicted/actual values
- Return type:
str
- property rebuild_model_after_each_test_forecast_step¶
Returns whether the forecasting model should be rebuilt after each forecasting step on the test data using both the training data and test data up to the current instance.
- Returns:
whether to rebuild
- Return type:
bool
- summary()¶
Generates a summary.
- Returns:
the summary
- Return type:
str
- property test_data¶
Returns the test data.
- Returns:
the test data, None if none available
- Return type:
- class weka.timeseries.TSForecaster(classname='weka.classifiers.timeseries.WekaForecaster', jobject=None, options=None)¶
Bases:
OptionHandler
Wrapper class for timeseries forecasters.
- property algorithm_name¶
Returns the name of the algorithm.
- Returns:
the name
- Return type:
str
- property base_model_has_serializer¶
Check whether the base learner requires special serialization.
- Returns:
True if base learner requires special serialization, false otherwise
- Return type:
bool
- build_forecaster(data)¶
Builds the forecaster using the provided data.
- Parameters:
data (Instances) – the data to train with
- clear_previous_state()¶
Reset model state.
- property fields_to_forecast¶
Returns the fields to forecast.
- Returns:
the fields
- Return type:
str
- forecast(steps)¶
Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated.
- Parameters:
steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.
- Returns:
a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)
- Return type:
list
- property header¶
Returns the header of the training data.
- Returns:
the structure of the training data, None if not available
- Return type:
- load_base_model(fname)¶
Loads the base model from the given filename.
- Parameters:
fname (str) – the file to load the base model from
- load_serialized_state(fname)¶
Loads the serialized state from the given filename.
- Parameters:
fname (str) – the file to deserialize the state from
- property previous_state¶
Returns the previous state.
- Returns:
the state as list of JPype object objects
- Return type:
list
- prime_forecaster(data)¶
Primes the forecaster using the provided data.
- Parameters:
data (Instances) – the data to prime with
- reset()¶
Resets the algorithm.
- run_forecaster(forecaster, options)¶
Builds the forecaster using the provided data.
- save_base_model(fname)¶
Saves the base model under the given filename.
- Parameters:
fname (str) – the file to save the base model under
- serialize_state(fname)¶
Serializes the state under the given filename.
- Parameters:
fname (str) – the file to serialize the state under
- property uses_state¶
Check whether the base learner requires operations regarding state.
- Returns:
True if base learner uses state-based predictions, false otherwise
- Return type:
bool
- class weka.timeseries.TSLagMaker(jobject=None, options=None)¶
Bases:
Filter
A class for creating lagged versions of target variable(s) for use in time series forecasting. Uses the TimeseriesTranslate filter. Has options for creating averages of consecutive lagged variables (which can be useful for long lagged variables). Some polynomials of time are also created (if there is a time stamp), such as time^2 and time^3. Also creates cross products between time and the lagged and averaged lagged variables. If there is no date time stamp in the data then the user has the option of having an artificial time stamp created. Time stamps, real or otherwise, are used for modeling trends rather than using a differencing-based approach.
Also has routines for dealing with a date timestamp - i.e. it can detect a monthly time period (because months are different lengths) and maps date time stamps to equal spaced time intervals. For example, in general, a date time stamp is remapped by subtracting the first observed value and adding this value divided by the constant delta (difference between consecutive steps) to the result. In the case of a detected monthly time period, the remapping involves subtracting the base year and then adding to this the number of the month within the current year plus twelve times the number of intervening years since the base year.
Also has routines for adding new attributes derived from a date time stamp to the data - e.g. AM indicator, day of the week, month, quarter etc. In the case where there is no real date time stamp, the user may specify a nominal periodic variable (if one exists in the data). For example, month might be coded as a nominal value. In this case it can be specified as the primary periodic variable. The point is, that in all these cases (nominal periodic and date-derived periodics), we are able to determine what the value of these variables will be in future instances (as computed from the last known historic instance).
- property add_am_indicator¶
Returns whether to add an AM indicator.
- Returns:
true if to add
- Return type:
bool
- add_custom_periodic(periodic)¶
Adds the custom periodic.
- Parameters:
periodic (str) – the periodic to add
- property add_day_of_month¶
Returns whether to add day of month attribute.
- Returns:
true if to add
- Return type:
bool
- property add_day_of_week¶
Returns whether to add day of week attribute.
- Returns:
true if to add
- Return type:
bool
- property add_month_of_year¶
Returns whether to add month of year attribute.
- Returns:
true if to add
- Return type:
bool
- property add_num_days_in_month¶
Returns whether to add # of days in month attribute.
- Returns:
true if to add
- Return type:
bool
- property add_quarter_of_year¶
Returns whether to add quarter of year attribute.
- Returns:
true if to add
- Return type:
bool
- property add_weekend_indicator¶
Returns whether to add a weekend indicator.
- Returns:
true if to add
- Return type:
bool
- property adjust_for_trends¶
Returns true if we are adjusting for trends via a real or artificial time stamp.
- Returns:
true if to adjust
- Return type:
bool
- property adjust_for_variance¶
Returns true if we are adjusting for variance by taking the log of the target(s).
- Returns:
true if to adjust
- Return type:
bool
- property artificial_time_start_value¶
Returns the current value of the artificial time stamp. After training, after priming, and prior to forecasting, this will be equal to the number of training instances seen.
- Returns:
the start
- Return type:
float
- property average_consecutive_long_lags¶
Returns true if consecutive long lagged variables are to be averaged.
- Returns:
true if to average
- Return type:
bool
- property average_lags_after¶
Returns the point after which long lagged variables will be averaged.
- Returns:
the lag
- Return type:
int
- clear_custom_periodics()¶
Clears the custom periodics.
- clear_lag_histories()¶
Clears any history accumulated in the lag creating filters.
- create_time_lag_cross_products(data)¶
Creates the cross-products.
- property current_timestamp_value¶
Returns the current (i.e. most recent) time stamp value. Unlike an artificial time stamp, the value after training, after priming and before forecasting, will be equal to the time stamp of the most recent priming instance.
- Returns:
the timestamp value
- Return type:
float
- property delta_time¶
Returns the difference between time values. This may be only approximate for periods based on dates. It is best to used date-based arithmetic in this case for incrementing/decrementing time stamps.
- Returns:
the delta
- Return type:
float
- property fields_to_lag¶
Returns the fields to lag as list.
- Returns:
the fields to lag
- Return type:
list
- property fields_to_lag_as_string¶
Returns the fields to lag as string.
- Returns:
the fields to lag
- Return type:
str
- property include_powers_of_time¶
Returns whether to include powers of time in the transformed data.
- Returns:
true if to include
- Return type:
bool
- property include_timelag_products¶
Returns whether to include products between time and the lagged variables.
- Returns:
true if to include
- Return type:
bool
- increment_artificial_time_value(increment)¶
Increment the artificial time value with the supplied increment value.
- Parameters:
increment (int) – the increment
- property is_using_artificial_time_index¶
Returns whether an artifical time index is used.
- Returns:
true if to add
- Return type:
bool
- property lag_range¶
Returns the lag range to create.
- Returns:
the lag range
- Return type:
str
- property max_lag¶
Returns the maximum lag to create.
- Returns:
the lag
- Return type:
int
- property min_lag¶
Returns the minimum lag to create.
- Returns:
the lag
- Return type:
int
- property num_consecutive_long_lags_to_average¶
Returns the number of consecutive long lagged variables to average.
- Returns:
the lag
- Return type:
int
- property overlay_fields¶
Returns the overlay fields as list.
- Returns:
the overlay fields
- Return type:
list
- property periodicity¶
Returns the Periodicity representing the time stamp in use for this lag maker. If the lag maker is not adjusting for trends, or an artificial time stamp is being used, then null is returned.
- Returns:
the periodicity
- Return type:
- property primary_periodic_field_name¶
Returns the name of the primary periodic attribute or null if one hasn’t been specified.
- Returns:
the name
- Return type:
str
- property remove_leading_instances_with_unknown_lag_values¶
Returns whether to remove instances with unknown lag values.
- Returns:
true if to remove
- Return type:
bool
- property skip_entries¶
Returns a list of time units to be ‘skipped’ - i.e. not considered as an increment. E.g financial markets don’t trade on the weekend, so the difference between friday closing and the following monday closing is one time unit (and not three). Can accept strings such as “sat”, “sunday”, “jan”, “august”, or explicit dates (with optional formatting string) such as “2011-07-04@yyyy-MM-dd”, or integers. Integers are interpreted with respect to the periodicity - e.g for daily data they are interpreted as day of the year; for hourly data, hour of the day; weekly data, week of the year.
- Returns:
the lag range
- Return type:
str
- property timestamp_field¶
Returns the overlay fields as list.
- Returns:
the overlay fields
- Return type:
list
- class weka.timeseries.TSLagUser(jobject)¶
Bases:
JavaObject
Wrapper class for TSLagUser objects.
- property tslag_maker¶
Returns the base forecaster.
- Returns:
the base forecaster
- Return type:
- class weka.timeseries.TestPart(jobject)¶
Bases:
JavaObject
Inner class defining one boundary of an interval.
- day()¶
Returns the day string.
- Returns:
the day string
- Return type:
str
- day_of_month(s)¶
Sets the day of the month.
- Parameters:
s (str) – the dom to use
- day_of_week(s)¶
Sets the day of the week.
- Parameters:
s (str) – the dow to use
- day_of_year(s)¶
Sets the day of year.
- Parameters:
s (str) – the doy to use
- eval(date, other)¶
Evaluate the supplied date against this bound. Handles date fields that are cyclic (such as month, day of week etc.) so that intervals such as oct < date < mar evaluate correctly.
- hour_of_day(s)¶
Sets the hour of the day.
- Parameters:
s (str) – the hod to use
- property is_upper¶
Returns true if this is the upper bound.
- Returns:
true if upper bound
- Return type:
bool
- minute_of_hour(s)¶
Sets the minute of the hour.
- Parameters:
s (str) – the moh to use
- property month¶
Returns the month string.
- Returns:
the month string
- Return type:
str
- operator(s)¶
Sets the operator.
- Parameters:
s (str) – the operator to use
- second(s)¶
Sets the second.
- Parameters:
s (str) – the second to use
- week_of_month(s)¶
Sets the week of the month.
- Parameters:
s (str) – the wom to use
- week_of_year(s)¶
Sets the week of the year.
- Parameters:
s (str) – the woy to use
- year(s)¶
Sets the year.
- Parameters:
s (str) – the year to use
- class weka.timeseries.WekaForecaster(jobject=None, options=None)¶
Bases:
TSForecaster
,TSLagUser
,ConfidenceIntervalForecaster
,OverlayForecaster
,IncrementallyPrimeable
Wrapper class for Weka timeseries forecasters.
- add_custom_periodic(periodic)¶
Adds the custom periodic.
- Parameters:
periodic (str) – the periodic to add
- property base_forecaster¶
Returns the base forecaster.
- Returns:
the base forecaster
- Return type:
- clear_custom_periodics()¶
Clears the custom periodics.