weka package

Subpackages

weka.associations module

class weka.associations.AssociationRule(jobject)

Bases: JavaObject

Wrapper for weka.associations.AssociationRule class.

property consequence

Get the consequence.

Returns:

the consequence, list of Item objects

Return type:

list

property consequence_support

Get the support for the consequence.

Returns:

the support

Return type:

int

property metric_names

Returns the metric names for the rule.

Returns:

the metric names

Return type:

list

metric_value(name)

Returns the named metric value for the rule.

Parameters:

name (str) – the name of the metric

Returns:

the metric value

Return type:

float

property metric_values

Returns the metric values for the rule.

Returns:

the metric values

Return type:

ndarray

property premise

Get the premise.

Returns:

the premise, list of Item objects

Return type:

list

property premise_support

Get the support for the premise.

Returns:

the support

Return type:

int

property primary_metric_name

Returns the primary metric name for the rule.

Returns:

the metric name

Return type:

str

property primary_metric_value

Returns the primary metric value for the rule.

Returns:

the metric value

Return type:

float

to_dict()

Builds a dictionary with the properties of the AssociationRule object.

Returns:

the AssociationRule dictionary

Return type:

dict

property total_support

Get the total support.

Returns:

the support

Return type:

int

property total_transactions

Get the total transactions.

Returns:

the transactions

Return type:

int

class weka.associations.AssociationRules(jobject)

Bases: JavaObject

Wrapper for weka.associations.AssociationRules class.

property producer

Returns a string describing the producer that generated these rules.

Returns:

the producer

Return type:

str

to_dict()

Returns a list of association rules in dict format

Returns:

the association rules

Return type:

list

class weka.associations.AssociationRulesIterator(rules)

Bases: object

Iterator for weka.associations.AssociationRules class.

class weka.associations.Associator(classname=None, jobject=None, options=None)

Bases: OptionHandler

Wrapper class for associators.

association_rules()

Returns association rules that were generated. Only if implements AssociationRulesProducer.

Returns:

the association rules that were generated

Return type:

AssociationRules

build_associations(data)

Builds the associator with the data.

Parameters:

data (Instances) – the data to train the associator with

can_produce_rules()

Checks whether association rules can be generated.

Returns:

whether scheme implements AssociationRulesProducer interface and association rules can be generated

Return type:

bool

property capabilities

Returns the capabilities of the associator.

Returns:

the capabilities

Return type:

Capabilities

property header

Returns the header of the training data.

Returns:

the structure of the training data, None if not available

Return type:

Instances

classmethod make_copy(associator)

Creates a copy of the associator.

Parameters:

associator (Associator) – the associator to copy

Returns:

the copy of the associator

Return type:

Associator

property rule_metric_names

Returns the rule metric names of the association rules. Only if implements AssociationRulesProducer.

Returns:

the metric names

Return type:

list

class weka.associations.Item(jobject)

Bases: JavaObject

Wrapper for weka.associations.Item class.

property attribute

Returns the attribute.

Returns:

the attribute

Return type:

Attribute

property comparison

Returns the comparison operator as string.

Returns:

the comparison iterator

Return type:

str

decrease_frequency(frequency=None)

Decreases the frequency.

Parameters:

frequency (int) – the frequency to decrease by, 1 if None

property frequency

Returns the frequency.

Returns:

the frequency

Return type:

int

increase_frequency(frequency=None)

Increases the frequency.

Parameters:

frequency (int) – the frequency to increase by, 1 if None

property item_value

Returns the item value as string.

Returns:

the item value

Return type:

str

weka.associations.main(args=None)

Runs an associator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:

args (list) – the command-line arguments to use, uses sys.argv if None

weka.associations.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:

0 for success, 1 for failure.

Return type:

int

weka.attribute_selection module

class weka.attribute_selection.ASEvaluation(classname='weka.attributeSelection.CfsSubsetEval', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for attribute selection evaluation algorithm.

build_evaluator(data)

Builds the evaluator with the data.

Parameters:

data (Instances) – the data to use

property capabilities

Returns the capabilities of the classifier.

Returns:

the capabilities

Return type:

Capabilities

convert_instance(inst)

Transforms an instance in the format of the original data to the transformed space.

Parameters:

inst (Instance) – the Instance to transform

Returns:

the transformed instance

Return type:

Instance

property header

Returns the header of the training data.

Returns:

the structure of the training data, None if not available

Return type:

Instances

post_process(indices)

Post-processes the evaluator with the selected attribute indices.

Parameters:

indices (ndarray) – the attribute indices list to use

Returns:

the processed indices

Return type:

ndarray

transformed_data(data)

Transform the supplied data set (assumed to be the same format as the training data).

Parameters:

data (Instances) – the data to transform

Returns:

the transformed data

Return type:

Instances

transformed_header()

Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through transformed_data(). Returns None if not a weka.attributeSelection.AttributeTransformer

Returns:

the header

Return type:

Instances

class weka.attribute_selection.ASSearch(classname='weka.attributeSelection.BestFirst', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for attribute selection search algorithm.

property header

Returns the header of the training data.

Returns:

the structure of the training data, None if not available

Return type:

Instances

search(evaluation, data)

Performs the search and returns the indices of the selected attributes.

Parameters:
Returns:

the selected attributes (0-based indices)

Return type:

ndarray

class weka.attribute_selection.AttributeSelection

Bases: JavaObject

Performs attribute selection using search and evaluation algorithms.

classmethod attribute_selection(evaluator, args)

Performs attribute selection using the given attribute evaluator and options.

Parameters:
  • evaluator (ASEvaluation) – the evaluator to use

  • args (list) – the command-line args for the attribute selection

Returns:

the results string

Return type:

str

crossvalidation(crossvalidation)

Sets whether to perform cross-validation.

Parameters:

crossvalidation (bool) – whether to perform cross-validation

property cv_results

Generates a results string from the last cross-validation attribute selection.

Returns:

the results string

Return type:

str

evaluator(evaluator)

Sets the evaluator to use.

Parameters:

evaluator (ASEvaluation) – the evaluator to use.

folds(folds)

Sets the number of folds to use for cross-validation.

Parameters:

folds (int) – the number of folds

property number_attributes_selected

Returns the number of attributes that were selected.

Returns:

the number of attributes

Return type:

int

property rank_results

Returns the results from the cross-validation for rankers.

Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.

Returns:

the dictionary of results (mean and stdev for rank and merit)

Return type:

dict

property ranked_attributes

Returns the matrix of ranked attributes from the last run.

Returns:

the Numpy matrix

Return type:

ndarray

ranking(ranking)

Sets whether to perform a ranking, if possible.

Parameters:

ranking (bool) – whether to perform a ranking

reduce_dimensionality(data)

Reduces the dimensionality of the provided Instance or Instances object.

Parameters:

data (Instances) – the data to process

Returns:

the reduced dataset

Return type:

Instances

property results_string

Generates a results string from the last attribute selection.

Returns:

the results string

Return type:

str

search(search)

Sets the search algorithm to use.

Parameters:

search (ASSearch) – the search algorithm

seed(seed)

Sets the seed for cross-validation.

Parameters:

seed (int) – the seed value

select_attributes(instances)

Performs attribute selection on the given dataset.

Parameters:

instances (Instances) – the data to process

select_attributes_cv_split(instances)

Performs attribute selection on the given cross-validation split.

Parameters:

instances (Instances) – the data to process

property selected_attributes

Returns the selected attributes from the last run.

Returns:

the Numpy array of 0-based indices

Return type:

ndarray

property subset_results

Returns the results from the cross-validation subsets, i.e., how often an attribute was selected.

Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.

Returns:

the list of results (double)

Return type:

list

weka.attribute_selection.main(args=None)

Runs attribute selection from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:

args (list) – the command-line arguments to use, uses sys.argv if None

weka.attribute_selection.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:

0 for success, 1 for failure.

Return type:

int

weka.classifiers module

class weka.classifiers.AttributeSelectedClassifier(jobject=None, options=None)

Bases: SingleClassifierEnhancer

Wrapper class for the AttributeSelectedClassifier.

property evaluator

Returns the evaluator.

Returns:

the evaluator in use

Return type:

ASEvaluation

property search

Returns the search.

Returns:

the search in use

Return type:

ASSearch

class weka.classifiers.Classifier(classname='weka.classifiers.rules.ZeroR', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for classifiers.

additional_measure(measure)

Returns the specified additional measure if implementing weka.core.AdditionalMeasureProducer, otherwise None.

Parameters:

measure (str) – the measure to retrieve

Returns:

the additional measure

Return type:

str

property additional_measures

Returns the list of additional measures if implementing weka.core.AdditionalMeasureProducer, otherwise None.

Returns:

the additional measures

Return type:

str

property batch_size

Returns the batch size, in case this classifier is a batch predictor.

Returns:

the batch size, None if not a batch predictor

Return type:

str

build_classifier(data)

Builds the classifier with the data.

Parameters:

data (Instances) – the data to train the classifier with

property capabilities

Returns the capabilities of the classifier.

Returns:

the capabilities

Return type:

Capabilities

classify_instance(inst)

Peforms a prediction.

Parameters:

inst (Instance) – the Instance to get a prediction for

Returns:

the classification (either regression value or 0-based label index)

Return type:

float

classmethod deserialize(ser_file)

Deserializes a classifier from a file.

Parameters:

ser_file (str) – the model file to deserialize

Returns:

model and, if available, the dataset header

Return type:

tuple

distribution_for_instance(inst)

Peforms a prediction, returning the class distribution.

Parameters:

inst (Instance) – the Instance to get the class distribution for

Returns:

the class distribution array

Return type:

ndarray

distributions_for_instances(data)

Peforms predictions, returning the class distributions.

Parameters:

data (Instances) – the Instances to get the class distributions for

Returns:

the class distribution matrix, None if not a batch predictor

Return type:

ndarray

property graph

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:

the generated graph string

Return type:

str

property graph_type

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:

the type

Return type:

int

has_efficient_batch_prediction()

Returns whether the classifier implements a more efficient batch prediction.

Returns:

True if a more efficient batch prediction is implemented, always False if not batch predictor

Return type:

bool

property header

Returns the header of the training data.

Returns:

the structure of the training data, None if not available

Return type:

Instances

classmethod make_copy(classifier)

Creates a copy of the classifier.

Parameters:

classifier (Classifier) – the classifier to copy

Returns:

the copy of the classifier

Return type:

Classifier

serialize(ser_file, header=None)

Serializes the classifier to the specified file.

Parameters:
  • ser_file (str) – the file to save the model to

  • header (Instances) – the (optional) dataset header to store alongside; recommended

to_source(classname)

Returns the model as Java source code if the classifier implements weka.classifiers.Sourcable.

Parameters:

classname (str) – the classname for the generated Java code

Returns:

the model as source code string

Return type:

str

update_classifier(inst)

Updates the classifier with the instance.

Parameters:

inst (Instance) – the Instance to update the classifier with

class weka.classifiers.CostMatrix(matrx=None, num_classes=None)

Bases: JavaObject

Class for storing and manipulating a misclassification cost matrix. The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i. Cost values can be fixed or computed on a per-instance basis (cost sensitive evaluation only) from the value of an attribute or an expression involving attribute(s).

apply_cost_matrix(data, rnd)

Applies the cost matrix to the data.

Parameters:
  • data (Instances) – the data to apply to

  • rnd (Random) – the random number generator

expected_costs(class_probs, inst=None)

Calculates the expected misclassification cost for each possible class value, given class probability estimates.

Parameters:

class_probs (ndarray) – the class probabilities

Returns:

the calculated costs

Return type:

ndarray

get_cell(row, col)

Returns the JPype object at the specified location.

Parameters:
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

Returns:

the object in that cell

Return type:

JPype object

get_element(row, col, inst=None)

Returns the value at the specified location.

Parameters:
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

  • inst (Instance) – the Instace

Returns:

the value in that cell

Return type:

float

get_max_cost(class_value, inst=None)

Gets the maximum cost for a particular class value.

Parameters:
  • class_value (int) – the class value to get the maximum cost for

  • inst (Instance) – the Instance

Returns:

the cost

Return type:

float

initialize()

Initializes the matrix.

normalize()

Normalizes the matrix.

property num_columns

Returns the number of columns.

Returns:

the number of columns

Return type:

int

property num_rows

Returns the number of rows.

Returns:

the number of rows

Return type:

int

classmethod parse_matlab(matlab)

Parses the costmatrix definition in matlab format and returns a matrix.

Parameters:

matlab (str) – the matlab matrix string, eg [1 2; 3 4].

Returns:

the generated matrix

Return type:

CostMatrix

set_cell(row, col, obj)

Sets the JPype object at the specified location. Automatically unwraps JavaObject.

Parameters:
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

  • obj (object) – the object for that cell

set_element(row, col, value)

Sets the float value at the specified location.

Parameters:
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

  • value (float) – the float value for that cell

property size

Returns the number of rows/columns.

Returns:

the number of rows/columns

Return type:

int

to_matlab()

Returns the matrix in Matlab format.

Returns:

the matrix as Matlab formatted string

Return type:

str

class weka.classifiers.Evaluation(data, cost_matrix=None)

Bases: JavaObject

Evaluation class for classifiers.

area_under_prc(class_index)

Returns the area under precision recall curve.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the area

Return type:

float

area_under_roc(class_index)

Returns the area under receiver operators characteristics curve.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the area

Return type:

float

property avg_cost

Returns the average cost.

Returns:

the cost

Return type:

float

class_details(title=None)

Generates the class details.

Parameters:

title (str) – optional title

Returns:

the details

Return type:

str

property class_priors

Returns the class priors.

Returns:

the priors

Return type:

ndarray

property confusion_matrix

Returns the confusion matrix.

Returns:

the matrix

Return type:

ndarray

property correct

Returns the correct count (nominal classes).

Returns:

the count

Return type:

float

property correlation_coefficient

Returns the correlation coefficient (numeric classes).

Returns:

the coefficient

Return type:

float

property coverage_of_test_cases_by_predicted_regions

Returns the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.

Returns:

the coverage

Return type:

float

crossvalidate_model(classifier, data, num_folds, rnd, output=None)

Crossvalidates the model using the specified data, number of folds and random number generator wrapper.

Parameters:
  • classifier (Classifier) – the classifier to cross-validate

  • data (Instances) – the data to evaluate on

  • num_folds (int) – the number of folds

  • rnd (Random) – the random number generator to use

  • output (PredictionOutput) – the output generator to use

cumulative_margin_distribution()

Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.

Returns:

the cumulative margin distribution

Return type:

str

property discard_predictions

Returns whether to discard predictions (saves memory).

Returns:

True if to discard

Return type:

bool

property error_rate

Returns the error rate (numeric classes).

Returns:

the rate

Return type:

float

classmethod evaluate_model(classifier, args)

Evaluates the classifier with the given options.

Parameters:
  • classifier (Classifier) – the classifier instance to use

  • args (list) – the command-line arguments to use

Returns:

the evaluation string

Return type:

str

evaluate_train_test_split(classifier, data, percentage, rnd=None, output=None)

Splits the data into train and test, builds the classifier with the training data and evaluates it against the test set.

Parameters:
  • classifier (Classifier) – the classifier to cross-validate

  • data (Instances) – the data to evaluate on

  • percentage (double) – the percentage split to use (amount to use for training)

  • rnd (Random) – the random number generator to use, if None the order gets preserved

  • output (PredictionOutput) – the output generator to use

f_measure(class_index)

Returns the f measure.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the measure

Return type:

float

false_negative_rate(class_index)

Returns the false negative rate.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the rate

Return type:

float

false_positive_rate(class_index)

Returns the false positive rate.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the rate

Return type:

float

property header

Returns the header format.

Returns:

the header format

Return type:

Instances

property incorrect

Returns the incorrect count (nominal classes).

Returns:

the count

Return type:

float

property kappa

Returns kappa.

Returns:

kappa

Return type:

float

property kb_information

Returns KB information.

Returns:

the information

Return type:

float

property kb_mean_information

Returns KB mean information.

Returns:

the information

Return type:

float

property kb_relative_information

Returns KB relative information.

Returns:

the information

Return type:

float

matrix(title=None)

Generates the confusion matrix.

Parameters:

title (str) – optional title

Returns:

the matrix

Return type:

str

matthews_correlation_coefficient(class_index)

Returns the Matthews correlation coefficient (nominal classes).

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the coefficient

Return type:

float

property mean_absolute_error

Returns the mean absolute error.

Returns:

the error

Return type:

float

property mean_prior_absolute_error

Returns the mean prior absolute error.

Returns:

the error

Return type:

float

num_false_negatives(class_index)

Returns the number of false negatives.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the count

Return type:

float

num_false_positives(class_index)

Returns the number of false positives.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the count

Return type:

float

property num_instances

Returns the number of instances that had a known class value.

Returns:

the number of instances

Return type:

float

num_true_negatives(class_index)

Returns the number of true negatives.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the count

Return type:

float

num_true_positives(class_index)

Returns the number of true positives.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the count

Return type:

float

property percent_correct

Returns the percent correct (nominal classes).

Returns:

the percentage

Return type:

float

property percent_incorrect

Returns the percent incorrect (nominal classes).

Returns:

the percentage

Return type:

float

property percent_unclassified

Returns the percent unclassified.

Returns:

the percentage

Return type:

float

precision(class_index)

Returns the precision.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the precision

Return type:

float

property predictions

Returns the predictions.

Returns:

the predictions. None if not available

Return type:

list

recall(class_index)

Returns the recall.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the recall

Return type:

float

property relative_absolute_error

Returns the relative absolute error.

Returns:

the error

Return type:

float

property root_mean_prior_squared_error

Returns the root mean prior squared error.

Returns:

the error

Return type:

float

property root_mean_squared_error

Returns the root mean squared error.

Returns:

the error

Return type:

float

property root_relative_squared_error

Returns the root relative squared error.

Returns:

the error

Return type:

float

property sf_entropy_gain

Returns the total SF, which is the null model entropy minus the scheme entropy.

Returns:

the gain

Return type:

float

property sf_mean_entropy_gain

Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.

Returns:

the gain

Return type:

float

property sf_mean_prior_entropy

Returns the entropy per instance for the null model.

Returns:

the entropy

Return type:

float

property sf_mean_scheme_entropy

Returns the entropy per instance for the scheme.

Returns:

the entropy

Return type:

float

property sf_prior_entropy

Returns the total entropy for the null model.

Returns:

the entropy

Return type:

float

property sf_scheme_entropy

Returns the total entropy for the scheme.

Returns:

the entropy

Return type:

float

property size_of_predicted_regions

Returns the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.

:return:the size of the regions :rtype: float

summary(title=None, complexity=False)

Generates a summary.

Parameters:
  • title (str) – optional title

  • complexity (bool) – whether to print the complexity information as well

Returns:

the summary

Return type:

str

test_model(classifier, data, output=None)

Evaluates the built model using the specified test data and returns the classifications.

Parameters:
Returns:

the classifications

Return type:

ndarray

test_model_once(classifier, inst, store=False)

Evaluates the built model using the specified test instance and returns the classification.

Parameters:
  • classifier (Classifier) – the classifier to cross-validate

  • inst (Instance) – the Instance to evaluate on

  • store (bool) – whether to store the predictions (some statistics in class_details() like AUC require that)

Returns:

the classification

Return type:

float

property total_cost

Returns the total cost.

Returns:

the cost

Return type:

float

true_negative_rate(class_index)

Returns the true negative rate.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the rate

Return type:

float

true_positive_rate(class_index)

Returns the true positive rate.

Parameters:

class_index (int) – the 0-based index of the class label

Returns:

the rate

Return type:

float

property unclassified

Returns the unclassified count.

Returns:

the count

Return type:

float

property unweighted_macro_f_measure

Returns the unweighted macro-averaged F-measure.

Returns:

the measure

Return type:

float

property unweighted_micro_f_measure

Returns the unweighted micro-averaged F-measure.

Returns:

the measure

Return type:

float

property weighted_area_under_prc

Returns the weighted area under precision recall curve.

Returns:

the weighted area

Return type:

float

property weighted_area_under_roc

Returns the weighted area under receiver operator characteristic curve.

Returns:

the weighted area

Return type:

float

property weighted_f_measure

Returns the weighted f measure.

Returns:

the measure

Return type:

float

property weighted_false_negative_rate

Returns the weighted false negative rate.

Returns:

the rate

Return type:

float

property weighted_false_positive_rate

Returns the weighted false positive rate.

Returns:

the rate

Return type:

float

property weighted_matthews_correlation

Returns the weighted Matthews correlation (nominal classes).

Returns:

the correlation

Return type:

float

property weighted_precision

Returns the weighted precision.

Returns:

the precision

Return type:

float

property weighted_recall

Returns the weighted recall.

Returns:

the recall

Return type:

float

property weighted_true_negative_rate

Returns the weighted true negative rate.

Returns:

the rate

Return type:

float

property weighted_true_positive_rate

Returns the weighted true positive rate.

Returns:

the rate

Return type:

float

class weka.classifiers.FilteredClassifier(jobject=None, options=None)

Bases: SingleClassifierEnhancer

Wrapper class for the filtered classifier.

check_for_modified_class_attribute(check)

Sets whether to check for class attribute modifications.

Parameters:

check (bool) – True if checking for modifications

property filter

Returns the filter.

Returns:

the filter in use

Return type:

weka.filters.Filter

class weka.classifiers.GridSearch(jobject=None, options=None)

Bases: SingleClassifierEnhancer

Wrapper class for the GridSearch meta-classifier.

property best

Returns the best classifier setup found during the th search.

Returns:

the best classifier setup

Return type:

Classifier

property evaluation

Returns the currently set statistic used for evaluation.

Returns:

the statistic

Return type:

SelectedTag

property x

Returns a dictionary with all the current values for the X of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:

the dictionary with the parameters

Return type:

dict

property y

Returns a dictionary with all the current values for the Y of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:

the dictionary with the parameters

Return type:

dict

class weka.classifiers.Kernel(classname=None, jobject=None, options=None)

Bases: OptionHandler

Wrapper class for kernels.

build_kernel(data)

Builds the classifier with the data.

Parameters:

data (Instances) – the data to train the classifier with

capabilities()

Returns the capabilities of the classifier.

Returns:

the capabilities

Return type:

Capabilities

property checks_turned_off

Returns whether checks are turned off.

Returns:

True if checks turned off

Return type:

bool

clean()

Frees the memory used by the kernel.

eval(id1, id2, inst1)

Computes the result of the kernel function for two instances. If id1 == -1, eval use inst1 instead of an instance in the dataset.

Parameters:
  • id1 (int) – the index of the first instance in the dataset

  • id2 (int) – the index of the second instance in the dataset

  • inst1 (Instance) – the instance corresponding to id1 (used if id1 == -1)

classmethod make_copy(kernel)

Creates a copy of the kernel.

Parameters:

kernel (Kernel) – the kernel to copy

Returns:

the copy of the kernel

Return type:

Kernel

class weka.classifiers.KernelClassifier(classname=None, jobject=None, options=None)

Bases: Classifier

Wrapper class for classifiers that have a kernel property, like SMO.

property kernel

Returns the current kernel.

Returns:

the kernel or None if none found

Return type:

Kernel

class weka.classifiers.MultiSearch(jobject=None, options=None)

Bases: SingleClassifierEnhancer

Wrapper class for the MultiSearch meta-classifier. NB: ‘multi-search-weka-package’ must be installed (https://github.com/fracpete/multisearch-weka-package), version 2016.1.15 or later.

property best

Returns the best classifier setup found during the th search.

Returns:

the best classifier setup

Return type:

Classifier

property evaluation

Returns the currently set statistic used for evaluation.

Returns:

the statistic

Return type:

SelectedTag

property parameters

Returns the list of currently set search parameters.

Returns:

the list of AbstractSearchParameter objects

Return type:

list

class weka.classifiers.MultipleClassifiersCombiner(classname=None, jobject=None, options=None)

Bases: Classifier

Wrapper class for classifiers that use a multiple base classifiers.

append(classifier)

Appends the classifier to the current list of classifiers.

Parameters:

classifier (Classifier) – the classifier to add

property classifiers

Returns the list of base classifiers.

Returns:

the classifier list

Return type:

list

clear()

Removes all classifiers.

class weka.classifiers.NominalPrediction(jobject)

Bases: Prediction

Wrapper class for a nominal prediction.

property distribution

Returns the class distribution.

Returns:

the class distribution list

Return type:

ndarray

property margin

Returns the margin.

Returns:

the margin

Return type:

float

class weka.classifiers.NumericPrediction(jobject)

Bases: Prediction

Wrapper class for a numeric prediction.

property error

Returns the error.

Returns:

the error

Return type:

float

property prediction_intervals

Returns the prediction intervals.

Returns:

the intervals

Return type:

ndarray

class weka.classifiers.Prediction(jobject)

Bases: JavaObject

Wrapper class for a prediction.

property actual

Returns the actual value.

Returns:

the actual value (internal representation)

Return type:

float

property predicted

Returns the predicted value.

Returns:

the predicted value (internal representation)

Return type:

float

property weight

Returns the weight.

Returns:

the weight of the Instance that was used

Return type:

float

class weka.classifiers.PredictionOutput(classname='weka.classifiers.evaluation.output.prediction.PlainText', jobject=None, options=None)

Bases: OptionHandler

For collecting predictions and generating output from. Must be derived from weka.classifiers.evaluation.output.prediction.AbstractOutput

buffer_content()

Returns the content of the buffer as string.

Returns:

The buffer content

Return type:

str

property header

Returns the header format.

Returns:

The dataset format

Return type:

Instances

print_all(cls, data)

Prints the header, classifications and footer to the buffer.

Parameters:
print_classification(cls, inst, index)

Prints the classification to the buffer.

Parameters:
  • cls (Classifier) – the classifier

  • inst (Instance) – the test instance

  • index (int) – the 0-based index of the test instance

print_classifications(cls, data)

Prints the classifications to the buffer.

Parameters:

Prints the footer to the buffer.

print_header()

Prints the header to the buffer.

class weka.classifiers.SingleClassifierEnhancer(classname=None, jobject=None, options=None)

Bases: Classifier

Wrapper class for classifiers that use a single base classifier.

property classifier

Returns the base classifier.

;return: the base classifier :rtype: Classifier

weka.classifiers.main(args=None)

Runs a classifier from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:

args (list) – the command-line arguments to use, uses sys.argv if None

weka.classifiers.predictions_to_instances(data, preds)

Turns the predictions turned into an Instances object.

Parameters:
  • data (Instances) – the original dataset format

  • preds (list) – the predictions to convert

Returns:

the predictions, None if no predictions present

Return type:

Instances

weka.classifiers.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:

0 for success, 1 for failure.

Return type:

int

weka.clusterers module

class weka.clusterers.ClusterEvaluation

Bases: JavaObject

Evaluation class for clusterers.

property classes_to_clusters

Return the array (ordered by cluster number) of minimum error class to cluster mappings.

Returns:

the mappings

Return type:

ndarray

property cluster_assignments

Return an array of cluster assignments corresponding to the most recent set of instances clustered.

Returns:

the cluster assignments

Return type:

ndarray

property cluster_results

The cluster results as string.

Returns:

the results string

Return type:

str

classmethod crossvalidate_model(clusterer, data, num_folds, rnd)

Cross-validates the clusterer and returns the loglikelihood.

Parameters:
  • clusterer (Clusterer) – the clusterer instance to evaluate

  • data (Instances) – the data to evaluate on

  • num_folds (int) – the number of folds

  • rnd (Random) – the random number generator to use

Returns:

the cross-validated loglikelihood

Return type:

float

classmethod evaluate_clusterer(clusterer, args)

Evaluates the clusterer with the given options.

Parameters:
  • clusterer (Clusterer) – the clusterer instance to evaluate

  • args (list) – the command-line arguments

Returns:

the evaluation result

Return type:

str

property log_likelihood

Returns the log likelihood.

Returns:

the log likelihood

Return type:

float

property num_clusters

Returns the number of clusters.

Returns:

the number of clusters

Return type:

int

set_model(clusterer)

Sets the built clusterer to evaluate.

Parameters:

clusterer (Clusterer) – the clusterer to evaluate

test_model(test)

Evaluates the currently set clusterer on the test set.

Parameters:

test (Instances) – the test set to use for evaluating

class weka.clusterers.Clusterer(classname='weka.clusterers.SimpleKMeans', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for clusterers.

build_clusterer(data)

Builds the clusterer with the data.

Parameters:

data (Instances) – the data to use for training the clusterer

property capabilities

Returns the capabilities of the clusterer.

Returns:

the capabilities

Return type:

Capabilities

cluster_instance(inst)

Peforms a prediction.

Parameters:

inst (Instance) – the instance to determine the cluster for

Returns:

the clustering result

Return type:

float

classmethod deserialize(ser_file)

Deserializes a clusterer from a file.

Parameters:

ser_file (str) – the model file to deserialize

Returns:

model and, if available, the dataset header

Return type:

tuple

distribution_for_instance(inst)

Peforms a prediction, returning the cluster distribution.

Parameters:

inst (Instance) – the Instance to get the cluster distribution for

Returns:

the cluster distribution

Return type:

np.ndarray

property graph

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:

the graph or None if not available

Return type:

str

property graph_type

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:

the type

Return type:

int

property header

Returns the header of the training data.

Returns:

the structure of the training data, None if not available

Return type:

Instances

classmethod make_copy(clusterer)

Creates a copy of the clusterer.

Parameters:

clusterer (Clusterer) – the clustererto copy

Returns:

the copy of the clusterer

Return type:

Clusterer

property number_of_clusters

Returns the number of clusters found.

Returns:

the number fo clusters

Return type:

int

serialize(ser_file, header=None)

Serializes the clusterer to the specified file.

Parameters:
  • ser_file (str) – the file to save the model to

  • header (Instances) – the (optional) dataset header to store alongside; recommended

update_clusterer(inst)

Updates the clusterer with the instance.

Parameters:

inst (Instance) – the Instance to update the clusterer with

update_finished()

Signals the clusterer that updating with new data has finished.

class weka.clusterers.FilteredClusterer(jobject=None, options=None)

Bases: SingleClustererEnhancer

Wrapper class for the filtered clusterer.

property filter

Returns the filter.

Returns:

the filter

Return type:

weka.filters.Filter

class weka.clusterers.SingleClustererEnhancer(classname=None, jobject=None, options=None)

Bases: Clusterer

Wrapper class for clusterers that use a single base clusterer.

property clusterer

Returns the base clusterer.

Returns:

the clusterer

Return type:

Clusterer

weka.clusterers.avg_silhouette_coefficient(clusterer, dist_func, data)

Computes the average silhouette coefficient for a clusterer. Based on Eibe Frank’s Groovy code: https://weka.8497.n7.nabble.com/Silhouette-Measures-and-Dunn-Index-DI-in-Weka-td44072.html

Parameters:
  • clusterer (Clusterer) – the trained clusterer model to evaluate

  • dist_func (DistanceFunction) – the distance function to use; if Euclidean, make sure that normalization is turned off

  • data (Instances) – the standardized data

Returns:

the average silhouette coefficient

Return type:

float

weka.clusterers.main(args=None)

Runs a clusterer from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:

args (list) – the command-line arguments to use, uses sys.argv if None

weka.clusterers.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:

0 for success, 1 for failure.

Return type:

int

weka.datagenerators module

class weka.datagenerators.DataGenerator(classname='weka.datagenerators.classifiers.classification.Agrawal', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for datagenerators.

property dataset_format

Returns the dataset format.

Returns:

the format

Return type:

Instances

define_data_format()

Returns the data format.

Returns:

the data format

Return type:

Instances

generate_example()

Returns a single Instance.

Returns:

the next example

Return type:

Instance

generate_examples()

Returns complete dataset.

Returns:

the generated dataset

Return type:

Instances

generate_finish()

Returns a “finish” string.

Returns:

a finish comment

Return type:

str

generate_start()

Returns a “start” string.

Returns:

the start comment

Return type:

str

classmethod make_copy(generator)

Creates a copy of the generator.

Parameters:

generator (DataGenerator) – the generator to copy

Returns:

the copy of the generator

Return type:

DataGenerator

classmethod make_data(generator, args)

Generates data using the generator and commandline arguments.

Parameters:
  • generator (DataGenerator) – the generator instance to use

  • args (list) – the command-line arguments

property num_examples_act

Returns a actual number of examples to generate.

Returns:

the number of examples

Return type:

int

property single_mode_flag

Returns whether data is generated row by row (True) or in one go (False).

Returns:

whether incremental

Return type:

bool

weka.datagenerators.main(args=None)

Runs a datagenerator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:

args (list) – the command-line arguments to use, uses sys.argv if None

weka.datagenerators.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:

0 for success, 1 for failure.

Return type:

int

weka.experiments module

class weka.experiments.Experiment(classname='weka.experiment.Experiment', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for an experiment.

class weka.experiments.ResultMatrix(classname='weka.experiment.ResultMatrixPlainText', jobject=None, options=None)

Bases: OptionHandler

For generating results from an Experiment run.

average(col)

Returns the average mean at this location (if valid location).

Parameters:

col (int) – the 0-based column index

Returns:

the mean

Return type:

float

property columns

Returns the column count.

Returns:

the count

Return type:

int

get_col_name(index)

Returns the column name.

Parameters:

index (int) – the 0-based row index

Returns:

the column name, None if invalid index

Return type:

str

get_mean(col, row)

Returns the mean at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

Returns:

the mean

Return type:

float

get_row_name(index)

Returns the row name.

Parameters:

index (int) – the 0-based row index

Returns:

the row name, None if invalid index

Return type:

str

get_stdev(col, row)

Returns the standard deviation at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

Returns:

the standard deviation

Return type:

float

hide_col(index)

Hides the column.

Parameters:

index (int) – the 0-based column index

hide_row(index)

Hides the row.

Parameters:

index (int) – the 0-based row index

is_col_hidden(index)

Returns whether the column is hidden.

Parameters:

index (int) – the 0-based column index

Returns:

true if hidden

Return type:

bool

is_row_hidden(index)

Returns whether the row is hidden.

Parameters:

index (int) – the 0-based row index

Returns:

true if hidden

Return type:

bool

property rows

Returns the row count.

Returns:

the count

Return type:

int

set_col_name(index, name)

Sets the column name.

Parameters:
  • index (int) – the 0-based row index

  • name (str) – the name of the column

set_mean(col, row, mean)

Sets the mean at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

  • mean (float) – the mean to set

set_row_name(index, name)

Sets the row name.

Parameters:
  • index (int) – the 0-based row index

  • name (str) – the name of the row

set_stdev(col, row, stdev)

Sets the standard deviation at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

  • stdev (float) – the standard deviation to set

show_col(index)

Shows the column.

Parameters:

index (int) – the 0-based column index

show_row(index)

Shows the row.

Parameters:

index (int) – the 0-based row index

to_string_header()

Returns the header of the matrix as a string.

Returns:

the header

Return type:

str

to_string_key()

Returns a key for all the col names, for better readability if the names got cut off.

Returns:

the key

Return type:

str

to_string_matrix()

Returns the matrix as a string.

Returns:

the generated output

Return type:

str

to_string_ranking()

Returns the ranking in a string representation.

Returns:

the ranking

Return type:

str

to_string_summary()

returns the summary as string.

Returns:

the summary

Return type:

str

class weka.experiments.SimpleCrossValidationExperiment(datasets, classifiers, classification=True, runs=10, folds=10, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)

Bases: SimpleExperiment

Performs a simple cross-validation experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:

producer and property path

Return type:

tuple

class weka.experiments.SimpleExperiment(datasets, classifiers, jobject=None, classification=True, runs=10, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)

Bases: OptionHandler

Ancestor for simple experiments.

See following URL for how to use the Experiment API: http://weka.wikispaces.com/Using+the+Experiment+API

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:

producer and property path

Return type:

tuple

configure_splitevaluator()

Configures and returns the SplitEvaluator and Classifier instance as tuple.

Returns:

evaluator and classifier

Return type:

tuple

experiment()

Returns the internal experiment, if set up, otherwise None.

Returns:

the internal experiment

Return type:

Experiment

classmethod load(filename)

Loads the experiment from disk.

Parameters:

filename (str) – the filename of the experiment to load

Returns:

the experiment

Return type:

Experiment

run()

Executes the experiment.

classmethod save(filename, experiment)

Saves the experiment to disk.

Parameters:
  • filename (str) – the filename to save the experiment to

  • experiment (Experiment) – the Experiment to save

setup()

Initializes the experiment.

class weka.experiments.SimpleRandomSplitExperiment(datasets, classifiers, classification=True, runs=10, percentage=66.6, preserve_order=False, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)

Bases: SimpleExperiment

Performs a simple random split experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:

producer and property path

Return type:

tuple

class weka.experiments.Tester(classname='weka.experiment.PairedCorrectedTTester', jobject=None, options=None, swap_rows_and_cols=False)

Bases: OptionHandler

For generating statistical results from an experiment.

property dataset_columns

Returns the list of column names that identify uniquely a dataset.

Returns:

the list of attributes names

Return type:

list

property fold_column

Returns the column name that holds the Fold number.

Returns:

the attribute name

Return type:

str

header(comparison_column)

Creates a “header” string describing the current resultsets.

Parameters:

comparison_column (int) – the index of the column to compare against

Returns:

the header

Return type:

str

init_columns()

Sets the column indices based on the supplied names if necessary.

property instances

Returns the data used in the analysis.

Returns:

the data in use

Return type:

Instances

multi_resultset_full(base_resultset, comparison_column)

Creates a comparison table where a base resultset is compared to the other resultsets.

Parameters:
  • base_resultset (int) – the 0-based index of the base resultset (eg classifier to compare against)

  • comparison_column (int) – the 0-based index of the column to compare against

Returns:

the comparison

Return type:

str

multi_resultset_ranking(comparison_column)

Creates a ranking.

Parameters:

comparison_column (int) – the 0-based index of the column to compare against

Returns:

the ranking

Return type:

str

multi_resultset_summary(comparison_column)

Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.

Parameters:

comparison_column (int) – the 0-based index of the column to compare against

Returns:

the summary

Return type:

str

property result_columns

Returns the list of column names that identify uniquely a result (eg classifier + options + ID).

Returns:

the list of attribute names

Return type:

list

property resultmatrix

Returns the ResultMatrix instance in use.

Returns:

the matrix in use

Return type:

ResultMatrix

property run_column

Returns the column name that holds the Run number.

Returns:

the attribute name

Return type:

str

property swap_rows_and_cols

Returns whether to swap rows/cols.

Returns:

whether to swap

Return type:

bool

weka.filters module

class weka.filters.AttributeSelection(jobject=None, options=None)

Bases: Filter

Wrapper class for weka.filters.supervised.attribute.AttributeSelection.

property evaluator

Returns the evaluator.

Returns:

the evaluator in use

Return type:

ASEvaluation

property search

Returns the search.

Returns:

the search in use

Return type:

ASSearch

class weka.filters.Filter(classname='weka.filters.AllFilter', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for filters.

batch_finished()

Signals the filter that the batch of data has finished.

Returns:

True if instances can be collected from the output

Return type:

bool

capabilities()

Returns the capabilities of the filter.

Returns:

the capabilities

Return type:

Capabilities

classmethod deserialize(ser_file)

Deserializes a filter from a file.

Parameters:

ser_file (str) – the file to deserialize from

Returns:

model

Return type:

Filter

filter(data)

Filters the dataset(s). When providing a list, this can be used to create compatible train/test sets, since the filter only gets initialized with the first dataset and all subsequent datasets get transformed using the same setup.

NB: inputformat(Instances) must have been called beforehand.

Parameters:

data (Instances or list of Instances) – the Instances to filter

Returns:

the filtered Instances object(s)

Return type:

Instances or list of Instances

input(inst)

Inputs the Instance.

Parameters:

inst (Instance) – the instance to filter

Returns:

True if filtered can be collected from output

Return type:

bool

inputformat(data)

Sets the input format.

Parameters:

data (Instances) – the data to use as input

classmethod make_copy(flter)

Creates a copy of the filter.

Parameters:

flter (Filter) – the filter to copy

Returns:

the copy of the filter

Return type:

Filter

output()

Outputs the filtered Instance.

Returns:

the filtered instance

Return type:

an Instance object

outputformat()

Returns the output format.

Returns:

the output format

Return type:

Instances

serialize(ser_file)

Serializes the filter to the specified file.

Parameters:

ser_file (str) – the file to save the filter to

to_source(classname, data)

Returns the model as Java source code if the classifier implements weka.filters.Sourcable.

Parameters:
  • classname (str) – the classname for the generated Java code

  • data (Instances) – the dataset used for initializing the filter

Returns:

the model as source code string

Return type:

str

class weka.filters.MultiFilter(jobject=None, options=None)

Bases: Filter

Wrapper class for weka.filters.MultiFilter.

append(filter)

Appends the filter to the current list of filters.

Parameters:

filter (Filter) – the filter to add

clear()

Removes all filters.

property filters

Returns the list of base filters.

Returns:

the filter list

Return type:

list

class weka.filters.StringToWordVector(jobject=None, options=None)

Bases: Filter

Wrapper class for weka.filters.unsupervised.attribute.StringToWordVector.

property stemmer

Returns the stemmer.

Returns:

the stemmer

Return type:

Stemmer

property stopwords

Returns the stopwords handler.

Returns:

the stopwords handler

Return type:

Stopwords

property tokenizer

Returns the tokenizer.

Returns:

the tokenizer

Return type:

Tokenizer

weka.filters.main(args=None)

Runs a filter from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:

args (list) – the command-line arguments to use, uses sys.argv if None

weka.filters.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:

0 for success, 1 for failure.

Return type:

int

weka.timeseries module

class weka.timeseries.ConfidenceIntervalForecaster(jobject)

Bases: JavaObject

Wrapper class for ConfidenceIntervalForecaster objects.

property calculate_conf_intervals_for_forecasts

Returns the number of steps for which confidence intervals will be computed.

Returns:

the steps

Return type:

int

property confidence_level

Returns the confidence level in use for computing confidence intervals.

Returns:

the level

Return type:

float

property is_producing_confidence_intervals

Returns true if this forecaster is computing confidence limits for some or all of its future forecasts (i.e. getCalculateConfIntervalsForForecasts() > 0).

Returns:

true if confidence intervals are produced

Return type:

bool

class weka.timeseries.CustomPeriodicTest(jobject=None, test=None)

Bases: JavaObject

Class that evaluates a supplied date against user-specified date constant fields. Fields that can be tested against include year, month, week of year, week of month, day of year, day of month, day of week, hour of day, minute of hour and second. Wildcard “*” matches any value for a particular field. Each CustomPeriodicTest is made up of one or two test parts. If the first test part’s operator is “=”, then no second part is necessary. Otherwise the first test part may use > or >= operators and the second test part < or <= operators. Taken together, the two parts define an interval. An optional label may be associated with the interval.

evaluate(date)

Evaluate the supplied date with respect to this custom periodic test interval.

Parameters:

date (Date) – the date to test

Returns:

true if the date lies within the interval.

Return type:

bool

property label

Returns the label.

Returns:

the label

Return type:

str

lower_test()

Returns the lower bound test.

Returns:

the test

Return type:

TestPart

test(test)

Sets the test as string.

Parameters:

test (str) – the test to use

upper_test()

Returns the upper bound test.

Returns:

the test

Return type:

TestPart

class weka.timeseries.ErrorModule(jobject)

Bases: TSEvalModule

Wrapper for ErrorModule objects.

counts_for_targets()

Returns the number of predicted, actual pairs for each target. Only entries that are non-missing for both actual and predicted contribute to the overall count.

Returns:

the number of predicted, actual pairs for each target.

Return type:

ndarray

errors_for_target(target)

Returns the list of the errors for the supplied target.

Parameters:

target (str) – the target to get the errors for

Returns:

the errors

Return type:

list

predictions_for_all_targets()

Returns the list of predictions for all targets.

Returns:

list of list of NumericPrediction

Return type:

list

predictions_for_target(target)

Returns the list of predictions for the target.

Parameters:

target (str) – the target to get the predictions for

Returns:

list of NumericPrediction

Return type:

list

class weka.timeseries.IncrementallyPrimeable(jobject)

Bases: JavaObject

Wrapper class for IncrementallyPrimeable objects.

prime_forecaster_incremental(inst)

Primes the forecaster using the provided data.

Parameters:

inst (Instance) – the instance to prime with

class weka.timeseries.OverlayForecaster(jobject)

Bases: JavaObject

Wrapper class for OverlayForecaster objects.

forecast_with_overlays(steps, overlays)

Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated. Also assumes that the forecaster has been told which attributes are to be considered “overlay” attributes in the data. Overlay data is data that the forecaster will be provided with when making a forecast into the future - i.e. it will be given the values of these attributes for future instances. The overlay data provided to this method should have the same structure as the original data used to train the forecaster - i.e. all original fields should be present, including the targets and time stamp field (if supplied). The values of targets will of course be missing (‘?’) since we want to forecast those. The time stamp values (if a time stamp is in use) may be provided, in which case the forecaster will use the time stamp values in the overlay instances. If the time stamp values are missing, then date arithmetic (for date time stamps) will be used to advance the time value beyond the last seen training value; similarly, for artificial time stamps or non-date time stamps, the computed time delta will be used to increment beyond the last seen training value.

The number of instances in the overlay data should typically match the number of steps that have been requested for forecasting. If these differ, then overlay.numInstances() will be the number of steps forecasted.

Parameters:
  • steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.

  • overlays (Instances) – the overlay data to use

Returns:

a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)

Return type:

list

property is_using_overlay_data

Returns true if overlay data has been used to train this forecaster, and thus is expected to be supplied for future time steps when making a forecast.

Returns:

property overlay_fields

Returns the overlay fields as string.

Returns:

the overlay fields

Return type:

str

class weka.timeseries.Periodicity(jobject=None, periodicity=None)

Bases: Enum

Defines periodicity.

class weka.timeseries.PeriodicityHandler(jobject)

Bases: JavaObject

Helper class to manage time stamp manipulation with respect to various periodicities. Has a routine to remap the time stamp, which is useful for date time stamps. Since dates are just manipulated internally as the number of milliseconds elapsed since the epoch, and any global trend modelling in regression functions results in enormous coefficients for this variable - remapping to a more reasonable scale prevents this. It also makes it easier to handle the case where there are time periods that shouldn’t be considered as a time unit increment, e.g. weekends and public holidays for financial trading data. These “holes” in the data can be accomodated by accumulating a negative offset for the remapped date when a particular data/time occurs in a user-specified “skip” list.

property delta_time

Returns the delta time.

Returns:

the delta time

Return type:

float

class weka.timeseries.TSEvalModule(jobject)

Bases: JavaObject

Wrapper for TSEvalModule objects.

calculate_measure()

Calculate the measure that this module represents.

Returns:

the value of the measure for this module for each of the target(s).

Return type:

ndarray

property definition

Returns the description.

property description

Returns the description.

property eval_name

Returns the name.

evaluate_for_instance(pred, inst)

Evaluate the given forecast(s) with respect to the given test instance. Targets with missing values are ignored.

Parameters:
classmethod module(name)

Returns the module with the specified name.

Parameters:

name (str) – the name of the module to return

Returns:

the TSEvalModule object

Return type:

TSEvalModule

classmethod module_list()

Returns list of available modules.

Returns:

the list of modules (TSEvalModule objects)

Return type:

list

reset()

Resets the module.

property summary

Returns the description.

property target_fields

Returns the list of target fields.

Returns:

the list of target fields

Return type:

list

class weka.timeseries.TSEvaluation(train, test_split_size=0.3, test=None)

Bases: JavaObject

Evaluation class for timeseries forecasters.

evaluate(forecaster, build_model=True)

Evaluates the forecaster.

Parameters:
  • forecaster (TSForecaster) – the forecaster to evaluate

  • build_model (bool) – whether to build the model as well

classmethod evaluate_forecaster(forecaster, args)

Evaluates the forecaster with the given options.

Parameters:
  • forecaster (TSForecaster) – the forecaster instance to use

  • args (list) – the command-line arguments to use

property evaluate_on_test_data

Returns whether to evaluate on the test data.

Returns:

whether to evaluate

Return type:

bool

property evaluate_on_training_data

Returns whether to evaluate on the training data.

Returns:

whether to evaluate

Return type:

bool

property evaluation_modules

Returns the list of evaluation modules in use.

Returns:

list of TSEvalModule object

Return type:

list

property forecast_future

Returns whether we should generate a future forecast beyond the end of the training and/or test data.

Returns:

whether to prime

Return type:

bool

property horizon

Returns the number of steps to predict into the future.

Returns:

the number of steps

Return type:

int

predictions_for_test_data(step_number)

Predictions for all targets for the specified step number on the test data.

Parameters:

step_number (int) – number of the step into the future to return predictions for

predictions_for_training_data(step_number)

Predictions for all targets for the specified step number on the training data.

Parameters:

step_number (int) – number of the step into the future to return predictions for

property prime_for_test_data_with_test_data

Returns whether evaluation for test data should begin by priming with the first x test data instances and then forecasting from step x + 1. This is the only option if there is no training data and a model has been deserialized from disk. If we have training data, and it occurs immediately before the test data in time, then we can prime with the last x instances from the training data.

Returns:

whether to prime

Return type:

bool

property prime_window_size

Returns the size of the priming window, ie the number of historical instances to present before making a forecast.

Returns:

the size

Return type:

int

print_future_forecast_on_test_data(forecaster)

Print the forecasted values (for all targets) beyond the end of the test data.

Parameters:

forecaster (TSForecaster) – the forecaster to use

Returns:

the forecasted values

Return type:

str

print_future_forecast_on_training_data(forecaster)

Print the forecasted values (for all targets) beyond the end of the training data.

Parameters:

forecaster (TSForecaster) – the forecaster to use

Returns:

the forecasted values

Return type:

str

print_predictions_for_test_data(title, target_name, step_ahead, instance_number_offset=0)

Print the predictions for a given target at a given step-ahead level on the test data.

Parameters:
  • title (str) – the title for the output

  • target_name (str) – the name of the target to print predictions for

  • step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions

  • instance_number_offset (int) – the offset from the start of the test data from which to print actual and predicted values

Returns:

the predicted/actual values

Return type:

str

print_predictions_for_training_data(title, target_name, step_ahead, instance_number_offset=0)

Print the predictions for a given target at a given step-ahead level on the training data.

Parameters:
  • title (str) – the title for the output

  • target_name (str) – the name of the target to print predictions for

  • step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions

  • instance_number_offset (int) – the offset from the start of the training data from which to print actual and predicted values

Returns:

the predicted/actual values

Return type:

str

property rebuild_model_after_each_test_forecast_step

Returns whether the forecasting model should be rebuilt after each forecasting step on the test data using both the training data and test data up to the current instance.

Returns:

whether to rebuild

Return type:

bool

summary()

Generates a summary.

Returns:

the summary

Return type:

str

property test_data

Returns the test data.

Returns:

the test data, None if none available

Return type:

Instances

property training_data

Returns the training data.

Returns:

the training data, None if none available

Return type:

Instances

class weka.timeseries.TSForecaster(classname='weka.classifiers.timeseries.WekaForecaster', jobject=None, options=None)

Bases: OptionHandler

Wrapper class for timeseries forecasters.

property algorithm_name

Returns the name of the algorithm.

Returns:

the name

Return type:

str

property base_model_has_serializer

Check whether the base learner requires special serialization.

Returns:

True if base learner requires special serialization, false otherwise

Return type:

bool

build_forecaster(data)

Builds the forecaster using the provided data.

Parameters:

data (Instances) – the data to train with

clear_previous_state()

Reset model state.

property fields_to_forecast

Returns the fields to forecast.

Returns:

the fields

Return type:

str

forecast(steps)

Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated.

Parameters:

steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.

Returns:

a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)

Return type:

list

property header

Returns the header of the training data.

Returns:

the structure of the training data, None if not available

Return type:

Instances

load_base_model(fname)

Loads the base model from the given filename.

Parameters:

fname (str) – the file to load the base model from

load_serialized_state(fname)

Loads the serialized state from the given filename.

Parameters:

fname (str) – the file to deserialize the state from

property previous_state

Returns the previous state.

Returns:

the state as list of JPype object objects

Return type:

list

prime_forecaster(data)

Primes the forecaster using the provided data.

Parameters:

data (Instances) – the data to prime with

reset()

Resets the algorithm.

run_forecaster(forecaster, options)

Builds the forecaster using the provided data.

save_base_model(fname)

Saves the base model under the given filename.

Parameters:

fname (str) – the file to save the base model under

serialize_state(fname)

Serializes the state under the given filename.

Parameters:

fname (str) – the file to serialize the state under

property uses_state

Check whether the base learner requires operations regarding state.

Returns:

True if base learner uses state-based predictions, false otherwise

Return type:

bool

class weka.timeseries.TSLagMaker(jobject=None, options=None)

Bases: Filter

A class for creating lagged versions of target variable(s) for use in time series forecasting. Uses the TimeseriesTranslate filter. Has options for creating averages of consecutive lagged variables (which can be useful for long lagged variables). Some polynomials of time are also created (if there is a time stamp), such as time^2 and time^3. Also creates cross products between time and the lagged and averaged lagged variables. If there is no date time stamp in the data then the user has the option of having an artificial time stamp created. Time stamps, real or otherwise, are used for modeling trends rather than using a differencing-based approach.

Also has routines for dealing with a date timestamp - i.e. it can detect a monthly time period (because months are different lengths) and maps date time stamps to equal spaced time intervals. For example, in general, a date time stamp is remapped by subtracting the first observed value and adding this value divided by the constant delta (difference between consecutive steps) to the result. In the case of a detected monthly time period, the remapping involves subtracting the base year and then adding to this the number of the month within the current year plus twelve times the number of intervening years since the base year.

Also has routines for adding new attributes derived from a date time stamp to the data - e.g. AM indicator, day of the week, month, quarter etc. In the case where there is no real date time stamp, the user may specify a nominal periodic variable (if one exists in the data). For example, month might be coded as a nominal value. In this case it can be specified as the primary periodic variable. The point is, that in all these cases (nominal periodic and date-derived periodics), we are able to determine what the value of these variables will be in future instances (as computed from the last known historic instance).

property add_am_indicator

Returns whether to add an AM indicator.

Returns:

true if to add

Return type:

bool

add_custom_periodic(periodic)

Adds the custom periodic.

Parameters:

periodic (str) – the periodic to add

property add_day_of_month

Returns whether to add day of month attribute.

Returns:

true if to add

Return type:

bool

property add_day_of_week

Returns whether to add day of week attribute.

Returns:

true if to add

Return type:

bool

property add_month_of_year

Returns whether to add month of year attribute.

Returns:

true if to add

Return type:

bool

property add_num_days_in_month

Returns whether to add # of days in month attribute.

Returns:

true if to add

Return type:

bool

property add_quarter_of_year

Returns whether to add quarter of year attribute.

Returns:

true if to add

Return type:

bool

property add_weekend_indicator

Returns whether to add a weekend indicator.

Returns:

true if to add

Return type:

bool

Returns true if we are adjusting for trends via a real or artificial time stamp.

Returns:

true if to adjust

Return type:

bool

property adjust_for_variance

Returns true if we are adjusting for variance by taking the log of the target(s).

Returns:

true if to adjust

Return type:

bool

property artificial_time_start_value

Returns the current value of the artificial time stamp. After training, after priming, and prior to forecasting, this will be equal to the number of training instances seen.

Returns:

the start

Return type:

float

property average_consecutive_long_lags

Returns true if consecutive long lagged variables are to be averaged.

Returns:

true if to average

Return type:

bool

property average_lags_after

Returns the point after which long lagged variables will be averaged.

Returns:

the lag

Return type:

int

clear_custom_periodics()

Clears the custom periodics.

clear_lag_histories()

Clears any history accumulated in the lag creating filters.

create_time_lag_cross_products(data)

Creates the cross-products.

Parameters:

data (Instances) – the data to create the cross-products for

Returns:

the cross-products

Return type:

Instances

property current_timestamp_value

Returns the current (i.e. most recent) time stamp value. Unlike an artificial time stamp, the value after training, after priming and before forecasting, will be equal to the time stamp of the most recent priming instance.

Returns:

the timestamp value

Return type:

float

property delta_time

Returns the difference between time values. This may be only approximate for periods based on dates. It is best to used date-based arithmetic in this case for incrementing/decrementing time stamps.

Returns:

the delta

Return type:

float

property fields_to_lag

Returns the fields to lag as list.

Returns:

the fields to lag

Return type:

list

property fields_to_lag_as_string

Returns the fields to lag as string.

Returns:

the fields to lag

Return type:

str

property include_powers_of_time

Returns whether to include powers of time in the transformed data.

Returns:

true if to include

Return type:

bool

property include_timelag_products

Returns whether to include products between time and the lagged variables.

Returns:

true if to include

Return type:

bool

increment_artificial_time_value(increment)

Increment the artificial time value with the supplied increment value.

Parameters:

increment (int) – the increment

property is_using_artificial_time_index

Returns whether an artifical time index is used.

Returns:

true if to add

Return type:

bool

property lag_range

Returns the lag range to create.

Returns:

the lag range

Return type:

str

property max_lag

Returns the maximum lag to create.

Returns:

the lag

Return type:

int

property min_lag

Returns the minimum lag to create.

Returns:

the lag

Return type:

int

property num_consecutive_long_lags_to_average

Returns the number of consecutive long lagged variables to average.

Returns:

the lag

Return type:

int

property overlay_fields

Returns the overlay fields as list.

Returns:

the overlay fields

Return type:

list

property periodicity

Returns the Periodicity representing the time stamp in use for this lag maker. If the lag maker is not adjusting for trends, or an artificial time stamp is being used, then null is returned.

Returns:

the periodicity

Return type:

Periodicity

property primary_periodic_field_name

Returns the name of the primary periodic attribute or null if one hasn’t been specified.

Returns:

the name

Return type:

str

property remove_leading_instances_with_unknown_lag_values

Returns whether to remove instances with unknown lag values.

Returns:

true if to remove

Return type:

bool

property skip_entries

Returns a list of time units to be ‘skipped’ - i.e. not considered as an increment. E.g financial markets don’t trade on the weekend, so the difference between friday closing and the following monday closing is one time unit (and not three). Can accept strings such as “sat”, “sunday”, “jan”, “august”, or explicit dates (with optional formatting string) such as “2011-07-04@yyyy-MM-dd”, or integers. Integers are interpreted with respect to the periodicity - e.g for daily data they are interpreted as day of the year; for hourly data, hour of the day; weekly data, week of the year.

Returns:

the lag range

Return type:

str

property timestamp_field

Returns the overlay fields as list.

Returns:

the overlay fields

Return type:

list

transformed_data(data)

Returns the transformed data.

Parameters:

data (Instances) – the data to transform

Returns:

the transformed data

Return type:

Instances

class weka.timeseries.TSLagUser(jobject)

Bases: JavaObject

Wrapper class for TSLagUser objects.

property tslag_maker

Returns the base forecaster.

Returns:

the base forecaster

Return type:

Classifier

class weka.timeseries.TestPart(jobject)

Bases: JavaObject

Inner class defining one boundary of an interval.

day()

Returns the day string.

Returns:

the day string

Return type:

str

day_of_month(s)

Sets the day of the month.

Parameters:

s (str) – the dom to use

day_of_week(s)

Sets the day of the week.

Parameters:

s (str) – the dow to use

day_of_year(s)

Sets the day of year.

Parameters:

s (str) – the doy to use

eval(date, other)

Evaluate the supplied date against this bound. Handles date fields that are cyclic (such as month, day of week etc.) so that intervals such as oct < date < mar evaluate correctly.

Parameters:
  • date (Date) – the date to test

  • other (TestPart) – the other bound

Returns:

true if the supplied date is within this bound

Return type:

bool

hour_of_day(s)

Sets the hour of the day.

Parameters:

s (str) – the hod to use

property is_upper

Returns true if this is the upper bound.

Returns:

true if upper bound

Return type:

bool

minute_of_hour(s)

Sets the minute of the hour.

Parameters:

s (str) – the moh to use

property month

Returns the month string.

Returns:

the month string

Return type:

str

operator(s)

Sets the operator.

Parameters:

s (str) – the operator to use

second(s)

Sets the second.

Parameters:

s (str) – the second to use

week_of_month(s)

Sets the week of the month.

Parameters:

s (str) – the wom to use

week_of_year(s)

Sets the week of the year.

Parameters:

s (str) – the woy to use

year(s)

Sets the year.

Parameters:

s (str) – the year to use

class weka.timeseries.WekaForecaster(jobject=None, options=None)

Bases: TSForecaster, TSLagUser, ConfidenceIntervalForecaster, OverlayForecaster, IncrementallyPrimeable

Wrapper class for Weka timeseries forecasters.

add_custom_periodic(periodic)

Adds the custom periodic.

Parameters:

periodic (str) – the periodic to add

property base_forecaster

Returns the base forecaster.

Returns:

the base forecaster

Return type:

Classifier

clear_custom_periodics()

Clears the custom periodics.