weka package

Subpackages

weka.associations module

class weka.associations.AssociationRule(jobject)

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.AssociationRule class.

property consequence

Get the the consequence.

Returns

the consequence, list of Item objects

Return type

list

property consequence_support

Get the support for the consequence.

Returns

the support

Return type

int

property metric_names

Returns the metric names for the rule.

Returns

the metric names

Return type

list

metric_value(name)

Returns the named metric value for the rule.

Parameters

name (str) – the name of the metric

Returns

the metric value

Return type

float

property metric_values

Returns the metric values for the rule.

Returns

the metric values

Return type

ndarray

property premise

Get the the premise.

Returns

the premise, list of Item objects

Return type

list

property premise_support

Get the support for the premise.

Returns

the support

Return type

int

property primary_metric_name

Returns the primary metric name for the rule.

Returns

the metric name

Return type

str

property primary_metric_value

Returns the primary metric value for the rule.

Returns

the metric value

Return type

float

to_dict()

Builds a dictionary with the properties of the AssociationRule object.

Returns

the AssociationRule dictionary

Return type

dict

property total_support

Get the total support.

Returns

the support

Return type

int

property total_transactions

Get the total transactions.

Returns

the transactions

Return type

int

class weka.associations.AssociationRules(jobject)

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.AssociationRules class.

property producer

Returns a string describing the producer that generated these rules.

Returns

the producer

Return type

str

to_dict()

Returns a list of association rules in dict format

Returns

the association rules

Return type

list

class weka.associations.AssociationRulesIterator(rules)

Bases: object

Iterator for weka.associations.AssociationRules class.

class weka.associations.Associator(classname=None, jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for associators.

association_rules()

Returns association rules that were generated. Only if implements AssociationRulesProducer.

Returns

the association rules that were generated

Return type

AssociationRules

build_associations(data)

Builds the associator with the data.

Parameters

data (Instances) – the data to train the associator with

can_produce_rules()

Checks whether association rules can be generated.

Returns

whether scheme implements AssociationRulesProducer interface and association rules can be generated

Return type

bool

property capabilities

Returns the capabilities of the associator.

Returns

the capabilities

Return type

Capabilities

property header

Returns the header of the training data.

Returns

the structure of the training data, None if not available

Return type

Instances

classmethod make_copy(associator)

Creates a copy of the clusterer.

Parameters

associator (Associator) – the associator to copy

Returns

the copy of the associator

Return type

Associator

property rule_metric_names

Returns the rule metric names of the association rules. Only if implements AssociationRulesProducer.

Returns

the metric names

Return type

list

class weka.associations.Item(jobject)

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.Item class.

property attribute

Returns the attribute.

Returns

the attribute

Return type

Attribute

property comparison

Returns the comparison operator as string.

Returns

the comparison iterator

Return type

str

decrease_frequency(frequency=None)

Decreases the frequency.

Parameters

frequency (int) – the frequency to decrease by, 1 if None

property frequency

Returns the frequency.

Returns

the frequency

Return type

int

increase_frequency(frequency=None)

Increases the frequency.

Parameters

frequency (int) – the frequency to increase by, 1 if None

property item_value

Returns the item value as string.

Returns

the item value

Return type

str

weka.associations.main(args=None)

Runs a associator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters

args (list) – the command-line arguments to use, uses sys.argv if None

weka.associations.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns

0 for success, 1 for failure.

Return type

int

weka.attribute_selection module

class weka.attribute_selection.ASEvaluation(classname='weka.attributeSelection.CfsSubsetEval', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for attribute selection evaluation algorithm.

build_evaluator(data)

Builds the evaluator with the data.

Parameters

data (Instances) – the data to use

property capabilities

Returns the capabilities of the classifier.

Returns

the capabilities

Return type

Capabilities

convert_instance(inst)

Transforms an instance in the format of the original data to the transformed space.

Parameters

inst (Instance) – the Instance to transform

Returns

the transformed instance

Return type

Instance

property header

Returns the header of the training data.

Returns

the structure of the training data, None if not available

Return type

Instances

post_process(indices)

Post-processes the evaluator with the selected attribute indices.

Parameters

indices (ndarray) – the attribute indices list to use

Returns

the processed indices

Return type

ndarray

transformed_data(data)

Transform the supplied data set (assumed to be the same format as the training data).

Parameters

data (Instances) – the data to transform

Returns

the transformed data

Return type

Instances

transformed_header()

Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through transformed_data(). Returns None if not a weka.attributeSelection.AttributeTransformer

Returns

the header

Return type

Instances

class weka.attribute_selection.ASSearch(classname='weka.attributeSelection.BestFirst', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for attribute selection search algorithm.

property header

Returns the header of the training data.

Returns

the structure of the training data, None if not available

Return type

Instances

search(evaluation, data)

Performs the search and returns the indices of the selected attributes.

Parameters
Returns

the selected attributes (0-based indices)

Return type

ndarray

class weka.attribute_selection.AttributeSelection

Bases: weka.core.classes.JavaObject

Performs attribute selection using search and evaluation algorithms.

classmethod attribute_selection(evaluator, args)

Performs attribute selection using the given attribute evaluator and options.

Parameters
  • evaluator (ASEvaluation) – the evaluator to use

  • args (list) – the command-line args for the attribute selection

Returns

the results string

Return type

str

crossvalidation(crossvalidation)

Sets whether to perform cross-validation.

Parameters

crossvalidation (bool) – whether to perform cross-validation

property cv_results

Generates a results string from the last cross-validation attribute selection.

Returns

the results string

Return type

str

evaluator(evaluator)

Sets the evaluator to use.

Parameters

evaluator (ASEvaluation) – the evaluator to use.

folds(folds)

Sets the number of folds to use for cross-validation.

Parameters

folds (int) – the number of folds

property number_attributes_selected

Returns the number of attributes that were selected.

Returns

the number of attributes

Return type

int

property rank_results

Returns the results from the cross-validation for rankers.

Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.

Returns

the dictionary of results (mean and stdev for rank and merit)

Return type

dict

property ranked_attributes

Returns the matrix of ranked attributes from the last run.

Returns

the Numpy matrix

Return type

ndarray

ranking(ranking)

Sets whether to perform a ranking, if possible.

Parameters

ranking (bool) – whether to perform a ranking

reduce_dimensionality(data)

Reduces the dimensionality of the provided Instance or Instances object.

Parameters

data (Instances) – the data to process

Returns

the reduced dataset

Return type

Instances

property results_string

Generates a results string from the last attribute selection.

Returns

the results string

Return type

str

search(search)

Sets the search algorithm to use.

Parameters

search (ASSearch) – the search algorithm

seed(seed)

Sets the seed for cross-validation.

Parameters

seed (int) – the seed value

select_attributes(instances)

Performs attribute selection on the given dataset.

Parameters

instances (Instances) – the data to process

select_attributes_cv_split(instances)

Performs attribute selection on the given cross-validation split.

Parameters

instances (Instances) – the data to process

property selected_attributes

Returns the selected attributes from the last run.

Returns

the Numpy array of 0-based indices

Return type

ndarray

property subset_results

Returns the results from the cross-validation subsets, i.e., how often an attribute was selected.

Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.

Returns

the list of results (double)

Return type

list

weka.attribute_selection.main(args=None)

Runs attribute selection from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters

args (list) – the command-line arguments to use, uses sys.argv if None

weka.attribute_selection.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns

0 for success, 1 for failure.

Return type

int

weka.classifiers module

class weka.classifiers.AttributeSelectedClassifier(jobject=None, options=None)

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the AttributeSelectedClassifier.

property evaluator

Returns the evaluator.

Returns

the evaluator in use

Return type

ASEvaluation

property search

Returns the search.

Returns

the search in use

Return type

ASSearch

class weka.classifiers.Classifier(classname='weka.classifiers.rules.ZeroR', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for classifiers.

property batch_size

Returns the batch size, in case this classifier is a batch predictor.

Returns

the batch size, None if not a batch predictor

Return type

str

build_classifier(data)

Builds the classifier with the data.

Parameters

data (Instances) – the data to train the classifier with

property capabilities

Returns the capabilities of the classifier.

Returns

the capabilities

Return type

Capabilities

classify_instance(inst)

Peforms a prediction.

Parameters

inst (Instance) – the Instance to get a prediction for

Returns

the classification (either regression value or 0-based label index)

Return type

float

classmethod deserialize(ser_file)

Deserializes a classifier from a file.

Parameters

ser_file (str) – the model file to deserialize

Returns

model and, if available, the dataset header

Return type

tuple

distribution_for_instance(inst)

Peforms a prediction, returning the class distribution.

Parameters

inst (Instance) – the Instance to get the class distribution for

Returns

the class distribution array

Return type

ndarray

distributions_for_instances(data)

Peforms predictions, returning the class distributions.

Parameters

data (Instances) – the Instances to get the class distributions for

Returns

the class distribution matrix, None if not a batch predictor

Return type

ndarray

property graph

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns

the generated graph string

Return type

str

property graph_type

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns

the type

Return type

int

has_efficient_batch_prediction()

Returns whether the classifier implements a more efficient batch prediction.

Returns

True if a more efficient batch prediction is implemented, always False if not batch predictor

Return type

bool

property header

Returns the header of the training data.

Returns

the structure of the training data, None if not available

Return type

Instances

classmethod make_copy(classifier)

Creates a copy of the classifier.

Parameters

classifier (Classifier) – the classifier to copy

Returns

the copy of the classifier

Return type

Classifier

serialize(ser_file, header=None)

Serializes the classifier to the specified file.

Parameters
  • ser_file (str) – the file to save the model to

  • header (Instances) – the (optional) dataset header to store alongside; recommended

to_source(classname)

Returns the model as Java source code if the classifier implements weka.classifiers.Sourcable.

Parameters

classname (str) – the classname for the generated Java code

Returns

the model as source code string

Return type

str

update_classifier(inst)

Updates the classifier with the instance.

Parameters

inst (Instance) – the Instance to update the classifier with

class weka.classifiers.CostMatrix(matrx=None, num_classes=None)

Bases: weka.core.classes.JavaObject

Class for storing and manipulating a misclassification cost matrix. The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i. Cost values can be fixed or computed on a per-instance basis (cost sensitive evaluation only) from the value of an attribute or an expression involving attribute(s).

apply_cost_matrix(data, rnd)

Applies the cost matrix to the data.

Parameters
  • data (Instances) – the data to apply to

  • rnd (Random) – the random number generator

expected_costs(class_probs, inst=None)

Calculates the expected misclassification cost for each possible class value, given class probability estimates.

Parameters

class_probs (ndarray) – the class probabilities

Returns

the calculated costs

Return type

ndarray

get_cell(row, col)

Returns the JB_Object at the specified location.

Parameters
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

Returns

the object in that cell

Return type

JB_Object

get_element(row, col, inst=None)

Returns the value at the specified location.

Parameters
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

  • inst (Instance) – the Instace

Returns

the value in that cell

Return type

float

get_max_cost(class_value, inst=None)

Gets the maximum cost for a particular class value.

Parameters
  • class_value (int) – the class value to get the maximum cost for

  • inst (Instance) – the Instance

Returns

the cost

Return type

float

initialize()

Initializes the matrix.

normalize()

Normalizes the matrix.

property num_columns

Returns the number of columns.

Returns

the number of columns

Return type

int

property num_rows

Returns the number of rows.

Returns

the number of rows

Return type

int

classmethod parse_matlab(matlab)

Parses the costmatrix definition in matlab format and returns a matrix.

Parameters

matlab (str) – the matlab matrix string, eg [1 2; 3 4].

Returns

the generated matrix

Return type

CostMatrix

set_cell(row, col, obj)

Sets the JB_Object at the specified location. Automatically unwraps JavaObject.

Parameters
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

  • obj (object) – the object for that cell

set_element(row, col, value)

Sets the float value at the specified location.

Parameters
  • row (int) – the 0-based index of the row

  • col (int) – the 0-based index of the column

  • value (float) – the float value for that cell

property size

Returns the number of rows/columns.

Returns

the number of rows/columns

Return type

int

to_matlab()

Returns the matrix in Matlab format.

Returns

the matrix as Matlab formatted string

Return type

str

class weka.classifiers.Evaluation(data, cost_matrix=None)

Bases: weka.core.classes.JavaObject

Evaluation class for classifiers.

area_under_prc(class_index)

Returns the area under precision recall curve.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the area

Return type

float

area_under_roc(class_index)

Returns the area under receiver operators characteristics curve.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the area

Return type

float

property avg_cost

Returns the average cost.

Returns

the cost

Return type

float

class_details(title=None)

Generates the class details.

Parameters

title (str) – optional title

Returns

the details

Return type

str

property class_priors

Returns the class priors.

Returns

the priors

Return type

ndarray

property confusion_matrix

Returns the confusion matrix.

Returns

the matrix

Return type

ndarray

property correct

Returns the correct count (nominal classes).

Returns

the count

Return type

float

property correlation_coefficient

Returns the correlation coefficient (numeric classes).

Returns

the coefficient

Return type

float

property coverage_of_test_cases_by_predicted_regions

Returns the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.

Returns

the coverage

Return type

float

crossvalidate_model(classifier, data, num_folds, rnd, output=None)

Crossvalidates the model using the specified data, number of folds and random number generator wrapper.

Parameters
  • classifier (Classifier) – the classifier to cross-validate

  • data (Instances) – the data to evaluate on

  • num_folds (int) – the number of folds

  • rnd (Random) – the random number generator to use

  • output (PredictionOutput) – the output generator to use

cumulative_margin_distribution()

Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.

Returns

the cumulative margin distribution

Return type

str

property discard_predictions

Returns whether to discard predictions (saves memory).

Returns

True if to discard

Return type

bool

property error_rate

Returns the error rate (numeric classes).

Returns

the rate

Return type

float

classmethod evaluate_model(classifier, args)

Evaluates the classifier with the given options.

Parameters
  • classifier (Classifier) – the classifier instance to use

  • args (list) – the command-line arguments to use

Returns

the evaluation string

Return type

str

evaluate_train_test_split(classifier, data, percentage, rnd=None, output=None)

Splits the data into train and test, builds the classifier with the training data and evaluates it against the test set.

Parameters
  • classifier (Classifier) – the classifier to cross-validate

  • data (Instances) – the data to evaluate on

  • percentage (double) – the percentage split to use (amount to use for training)

  • rnd (Random) – the random number generator to use, if None the order gets preserved

  • output (PredictionOutput) – the output generator to use

f_measure(class_index)

Returns the f measure.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the measure

Return type

float

false_negative_rate(class_index)

Returns the false negative rate.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the rate

Return type

float

false_positive_rate(class_index)

Returns the false positive rate.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the rate

Return type

float

property header

Returns the header format.

Returns

the header format

Return type

Instances

property incorrect

Returns the incorrect count (nominal classes).

Returns

the count

Return type

float

property kappa

Returns kappa.

Returns

kappa

Return type

float

property kb_information

Returns KB information.

Returns

the information

Return type

float

property kb_mean_information

Returns KB mean information.

Returns

the information

Return type

float

property kb_relative_information

Returns KB relative information.

Returns

the information

Return type

float

matrix(title=None)

Generates the confusion matrix.

Parameters

title (str) – optional title

Returns

the matrix

Return type

str

matthews_correlation_coefficient(class_index)

Returns the Matthews correlation coefficient (nominal classes).

Parameters

class_index (int) – the 0-based index of the class label

Returns

the coefficient

Return type

float

property mean_absolute_error

Returns the mean absolute error.

Returns

the error

Return type

float

property mean_prior_absolute_error

Returns the mean prior absolute error.

Returns

the error

Return type

float

num_false_negatives(class_index)

Returns the number of false negatives.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the count

Return type

float

num_false_positives(class_index)

Returns the number of false positives.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the count

Return type

float

property num_instances

Returns the number of instances that had a known class value.

Returns

the number of instances

Return type

float

num_true_negatives(class_index)

Returns the number of true negatives.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the count

Return type

float

num_true_positives(class_index)

Returns the number of true positives.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the count

Return type

float

property percent_correct

Returns the percent correct (nominal classes).

Returns

the percentage

Return type

float

property percent_incorrect

Returns the percent incorrect (nominal classes).

Returns

the percentage

Return type

float

property percent_unclassified

Returns the percent unclassified.

Returns

the percentage

Return type

float

precision(class_index)

Returns the precision.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the precision

Return type

float

property predictions

Returns the predictions.

Returns

the predictions. None if not available

Return type

list

recall(class_index)

Returns the recall.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the recall

Return type

float

property relative_absolute_error

Returns the relative absolute error.

Returns

the error

Return type

float

property root_mean_prior_squared_error

Returns the root mean prior squared error.

Returns

the error

Return type

float

property root_mean_squared_error

Returns the root mean squared error.

Returns

the error

Return type

float

property root_relative_squared_error

Returns the root relative squared error.

Returns

the error

Return type

float

property sf_entropy_gain

Returns the total SF, which is the null model entropy minus the scheme entropy.

Returns

the gain

Return type

float

property sf_mean_entropy_gain

Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.

Returns

the gain

Return type

float

property sf_mean_prior_entropy

Returns the entropy per instance for the null model.

Returns

the entropy

Return type

float

property sf_mean_scheme_entropy

Returns the entropy per instance for the scheme.

Returns

the entropy

Return type

float

property sf_prior_entropy

Returns the total entropy for the null model.

Returns

the entropy

Return type

float

property sf_scheme_entropy

Returns the total entropy for the scheme.

Returns

the entropy

Return type

float

property size_of_predicted_regions

Returns the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.

:return:the size of the regions :rtype: float

summary(title=None, complexity=False)

Generates a summary.

Parameters
  • title (str) – optional title

  • complexity (bool) – whether to print the complexity information as well

Returns

the summary

Return type

str

test_model(classifier, data, output=None)

Evaluates the built model using the specified test data and returns the classifications.

Parameters
Returns

the classifications

Return type

ndarray

test_model_once(classifier, inst, store=False)

Evaluates the built model using the specified test instance and returns the classification.

Parameters
  • classifier (Classifier) – the classifier to cross-validate

  • inst (Instance) – the Instance to evaluate on

  • store (bool) – whether to store the predictions (some statistics in class_details() like AUC require that)

Returns

the classification

Return type

float

property total_cost

Returns the total cost.

Returns

the cost

Return type

float

true_negative_rate(class_index)

Returns the true negative rate.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the rate

Return type

float

true_positive_rate(class_index)

Returns the true positive rate.

Parameters

class_index (int) – the 0-based index of the class label

Returns

the rate

Return type

float

property unclassified

Returns the unclassified count.

Returns

the count

Return type

float

property unweighted_macro_f_measure

Returns the unweighted macro-averaged F-measure.

Returns

the measure

Return type

float

property unweighted_micro_f_measure

Returns the unweighted micro-averaged F-measure.

Returns

the measure

Return type

float

property weighted_area_under_prc

Returns the weighted area under precision recall curve.

Returns

the weighted area

Return type

float

property weighted_area_under_roc

Returns the weighted area under receiver operator characteristic curve.

Returns

the weighted area

Return type

float

property weighted_f_measure

Returns the weighted f measure.

Returns

the measure

Return type

float

property weighted_false_negative_rate

Returns the weighted false negative rate.

Returns

the rate

Return type

float

property weighted_false_positive_rate

Returns the weighted false positive rate.

Returns

the rate

Return type

float

property weighted_matthews_correlation

Returns the weighted Matthews correlation (nominal classes).

Returns

the correlation

Return type

float

property weighted_precision

Returns the weighted precision.

Returns

the precision

Return type

float

property weighted_recall

Returns the weighted recall.

Returns

the recall

Return type

float

property weighted_true_negative_rate

Returns the weighted true negative rate.

Returns

the rate

Return type

float

property weighted_true_positive_rate

Returns the weighted true positive rate.

Returns

the rate

Return type

float

class weka.classifiers.FilteredClassifier(jobject=None, options=None)

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the filtered classifier.

check_for_modified_class_attribute(check)

Sets whether to check for class attribute modifications.

Parameters

check (bool) – True if checking for modifications

property filter

Returns the filter.

Returns

the filter in use

Return type

Filter

class weka.classifiers.GridSearch(jobject=None, options=None)

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the GridSearch meta-classifier.

property best

Returns the best classifier setup found during the th search.

Returns

the best classifier setup

Return type

Classifier

property evaluation

Returns the currently set statistic used for evaluation.

Returns

the statistic

Return type

SelectedTag

property x

Returns a dictionary with all the current values for the X of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns

the dictionary with the parameters

Return type

dict

property y

Returns a dictionary with all the current values for the Y of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns

the dictionary with the parameters

Return type

dict

class weka.classifiers.Kernel(classname=None, jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for kernels.

build_kernel(data)

Builds the classifier with the data.

Parameters

data (Instances) – the data to train the classifier with

capabilities()

Returns the capabilities of the classifier.

Returns

the capabilities

Return type

Capabilities

property checks_turned_off

Returns whether checks are turned off.

Returns

True if checks turned off

Return type

bool

clean()

Frees the memory used by the kernel.

eval(id1, id2, inst1)

Computes the result of the kernel function for two instances. If id1 == -1, eval use inst1 instead of an instance in the dataset.

Parameters
  • id1 (int) – the index of the first instance in the dataset

  • id2 (int) – the index of the second instance in the dataset

  • inst1 (Instance) – the instance corresponding to id1 (used if id1 == -1)

classmethod make_copy(kernel)

Creates a copy of the kernel.

Parameters

kernel (Kernel) – the kernel to copy

Returns

the copy of the kernel

Return type

Kernel

class weka.classifiers.KernelClassifier(classname=None, jobject=None, options=None)

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that have a kernel property, like SMO.

property kernel

Returns the current kernel.

Returns

the kernel or None if none found

Return type

Kernel

class weka.classifiers.MultiSearch(jobject=None, options=None)

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the MultiSearch meta-classifier. NB: ‘multi-search-weka-package’ must be installed (https://github.com/fracpete/multisearch-weka-package), version 2016.1.15 or later.

property best

Returns the best classifier setup found during the th search.

Returns

the best classifier setup

Return type

Classifier

property evaluation

Returns the currently set statistic used for evaluation.

Returns

the statistic

Return type

SelectedTag

property parameters

Returns the list of currently set search parameters.

Returns

the list of AbstractSearchParameter objects

Return type

list

class weka.classifiers.MultipleClassifiersCombiner(classname=None, jobject=None, options=None)

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that use a multiple base classifiers.

append(classifier)

Appends the classifier to the current list of classifiers.

Parameters

classifier (Classifier) – the classifier to add

property classifiers

Returns the list of base classifiers.

Returns

the classifier list

Return type

list

clear()

Removes all classifiers.

class weka.classifiers.NominalPrediction(jobject)

Bases: weka.classifiers.Prediction

Wrapper class for a nominal prediction.

property distribution

Returns the class distribution.

Returns

the class distribution list

Return type

ndarray

property margin

Returns the margin.

Returns

the margin

Return type

float

class weka.classifiers.NumericPrediction(jobject)

Bases: weka.classifiers.Prediction

Wrapper class for a numeric prediction.

property error

Returns the error.

Returns

the error

Return type

float

property prediction_intervals

Returns the prediction intervals.

Returns

the intervals

Return type

ndarray

class weka.classifiers.Prediction(jobject)

Bases: weka.core.classes.JavaObject

Wrapper class for a prediction.

property actual

Returns the actual value.

Returns

the actual value (internal representation)

Return type

float

property predicted

Returns the predicted value.

Returns

the predicted value (internal representation)

Return type

float

property weight

Returns the weight.

Returns

the weight of the Instance that was used

Return type

float

class weka.classifiers.PredictionOutput(classname='weka.classifiers.evaluation.output.prediction.PlainText', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

For collecting predictions and generating output from. Must be derived from weka.classifiers.evaluation.output.prediction.AbstractOutput

buffer_content()

Returns the content of the buffer as string.

Returns

The buffer content

Return type

str

property header

Returns the header format.

Returns

The dataset format

Return type

Instances

print_all(cls, data)

Prints the header, classifications and footer to the buffer.

Parameters
print_classification(cls, inst, index)

Prints the classification to the buffer.

Parameters
  • cls (Classifier) – the classifier

  • inst (Instance) – the test instance

  • index (int) – the 0-based index of the test instance

print_classifications(cls, data)

Prints the classifications to the buffer.

Parameters

Prints the footer to the buffer.

print_header()

Prints the header to the buffer.

class weka.classifiers.SingleClassifierEnhancer(classname=None, jobject=None, options=None)

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that use a single base classifier.

property classifier

Returns the base classifier.

;return: the base classifier :rtype: Classifier

weka.classifiers.main(args=None)

Runs a classifier from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters

args (list) – the command-line arguments to use, uses sys.argv if None

weka.classifiers.predictions_to_instances(data, preds)

Turns the predictions turned into an Instances object.

Parameters
  • data (Instances) – the original dataset format

  • preds (list) – the predictions to convert

Returns

the predictions, None if no predictions present

Return type

Instances

weka.classifiers.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns

0 for success, 1 for failure.

Return type

int

weka.clusterers module

class weka.clusterers.ClusterEvaluation

Bases: weka.core.classes.JavaObject

Evaluation class for clusterers.

property classes_to_clusters

Return the array (ordered by cluster number) of minimum error class to cluster mappings.

Returns

the mappings

Return type

ndarray

property cluster_assignments

Return an array of cluster assignments corresponding to the most recent set of instances clustered.

Returns

the cluster assignments

Return type

ndarray

property cluster_results

The cluster results as string.

Returns

the results string

Return type

str

classmethod crossvalidate_model(clusterer, data, num_folds, rnd)

Cross-validates the clusterer and returns the loglikelihood.

Parameters
  • clusterer (Clusterer) – the clusterer instance to evaluate

  • data (Instances) – the data to evaluate on

  • num_folds (int) – the number of folds

  • rnd (Random) – the random number generator to use

Returns

the cross-validated loglikelihood

Return type

float

classmethod evaluate_clusterer(clusterer, args)

Evaluates the clusterer with the given options.

Parameters
  • clusterer (Clusterer) – the clusterer instance to evaluate

  • args (list) – the command-line arguments

Returns

the evaluation result

Return type

str

property log_likelihood

Returns the log likelihood.

Returns

the log likelihood

Return type

float

property num_clusters

Returns the number of clusters.

Returns

the number of clusters

Return type

int

set_model(clusterer)

Sets the built clusterer to evaluate.

Parameters

clusterer (Clusterer) – the clusterer to evaluate

test_model(test)

Evaluates the currently set clusterer on the test set.

Parameters

test (Instances) – the test set to use for evaluating

class weka.clusterers.Clusterer(classname='weka.clusterers.SimpleKMeans', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for clusterers.

build_clusterer(data)

Builds the clusterer with the data.

Parameters

data (Instances) – the data to use for training the clusterer

property capabilities

Returns the capabilities of the clusterer.

Returns

the capabilities

Return type

Capabilities

cluster_instance(inst)

Peforms a prediction.

Parameters

inst (Instance) – the instance to determine the cluster for

Returns

the clustering result

Return type

float

classmethod deserialize(ser_file)

Deserializes a clusterer from a file.

Parameters

ser_file (str) – the model file to deserialize

Returns

model and, if available, the dataset header

Return type

tuple

distribution_for_instance(inst)

Peforms a prediction, returning the cluster distribution.

Parameters

inst (Instance) – the Instance to get the cluster distribution for

Returns

the cluster distribution

Return type

float[]

property graph

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns

the graph or None if not available

Return type

str

property graph_type

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns

the type

Return type

int

property header

Returns the header of the training data.

Returns

the structure of the training data, None if not available

Return type

Instances

classmethod make_copy(clusterer)

Creates a copy of the clusterer.

Parameters

clusterer (Clusterer) – the clustererto copy

Returns

the copy of the clusterer

Return type

Clusterer

property number_of_clusters

Returns the number of clusters found.

Returns

the number fo clusters

Return type

int

serialize(ser_file, header=None)

Serializes the clusterer to the specified file.

Parameters
  • ser_file (str) – the file to save the model to

  • header (Instances) – the (optional) dataset header to store alongside; recommended

update_clusterer(inst)

Updates the clusterer with the instance.

Parameters

inst (Instance) – the Instance to update the clusterer with

update_finished()

Signals the clusterer that updating with new data has finished.

class weka.clusterers.FilteredClusterer(jobject=None, options=None)

Bases: weka.clusterers.SingleClustererEnhancer

Wrapper class for the filtered clusterer.

property filter

Returns the filter.

Returns

the filter

Return type

Filter

class weka.clusterers.SingleClustererEnhancer(classname=None, jobject=None, options=None)

Bases: weka.clusterers.Clusterer

Wrapper class for clusterers that use a single base clusterer.

property clusterer

Returns the base clusterer.

Returns

the clusterer

Return type

Clusterer

weka.clusterers.avg_silhouette_coefficient(clusterer, dist_func, data)

Computes the average silhouette coefficient for a clusterer. Based on Eibe Frank’s Groovy code: https://weka.8497.n7.nabble.com/Silhouette-Measures-and-Dunn-Index-DI-in-Weka-td44072.html

Parameters
  • clusterer (Clusterer) – the trained clusterer model to evaluate

  • dist_func (DistanceFunction) – the distance function to use; if Euclidean, make sure that normalization is turned off

  • data (Instances) – the standardized data

Returns

the average silhouette coefficient

Return type

float

weka.clusterers.main(args=None)

Runs a clusterer from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters

args (list) – the command-line arguments to use, uses sys.argv if None

weka.clusterers.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns

0 for success, 1 for failure.

Return type

int

weka.datagenerators module

class weka.datagenerators.DataGenerator(classname='weka.datagenerators.classifiers.classification.Agrawal', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for datagenerators.

property dataset_format

Returns the dataset format.

Returns

the format

Return type

Instances

define_data_format()

Returns the data format.

Returns

the data format

Return type

Instances

generate_example()

Returns a single Instance.

Returns

the next example

Return type

Instance

generate_examples()

Returns complete dataset.

Returns

the generated dataset

Return type

Instances

generate_finish()

Returns a “finish” string.

Returns

a finish comment

Return type

str

generate_start()

Returns a “start” string.

Returns

the start comment

Return type

str

classmethod make_copy(generator)

Creates a copy of the generator.

Parameters

generator (DataGenerator) – the generator to copy

Returns

the copy of the generator

Return type

DataGenerator

classmethod make_data(generator, args)

Generates data using the generator and commandline arguments.

Parameters
  • generator (DataGenerator) – the generator instance to use

  • args (list) – the command-line arguments

property num_examples_act

Returns a actual number of examples to generate.

Returns

the number of examples

Return type

int

property single_mode_flag

Returns whether data is generated row by row (True) or in one go (False).

Returns

whether incremental

Return type

bool

weka.datagenerators.main(args=None)

Runs a datagenerator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters

args (list) – the command-line arguments to use, uses sys.argv if None

weka.datagenerators.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns

0 for success, 1 for failure.

Return type

int

weka.experiments module

class weka.experiments.Experiment(classname='weka.experiment.Experiment', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for an experiment.

class weka.experiments.ResultMatrix(classname='weka.experiment.ResultMatrixPlainText', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

For generating results from an Experiment run.

average(col)

Returns the average mean at this location (if valid location).

Parameters

col (int) – the 0-based column index

Returns

the mean

Return type

float

property columns

Returns the column count.

Returns

the count

Return type

int

get_col_name(index)

Returns the column name.

Parameters

index (int) – the 0-based row index

Returns

the column name, None if invalid index

Return type

str

get_mean(col, row)

Returns the mean at this location (if valid location).

Parameters
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

Returns

the mean

Return type

float

get_row_name(index)

Returns the row name.

Parameters

index (int) – the 0-based row index

Returns

the row name, None if invalid index

Return type

str

get_stdev(col, row)

Returns the standard deviation at this location (if valid location).

Parameters
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

Returns

the standard deviation

Return type

float

hide_col(index)

Hides the column.

Parameters

index (int) – the 0-based column index

hide_row(index)

Hides the row.

Parameters

index (int) – the 0-based row index

is_col_hidden(index)

Returns whether the column is hidden.

Parameters

index (int) – the 0-based column index

Returns

true if hidden

Return type

bool

is_row_hidden(index)

Returns whether the row is hidden.

Parameters

index (int) – the 0-based row index

Returns

true if hidden

Return type

bool

property rows

Returns the row count.

Returns

the count

Return type

int

set_col_name(index, name)

Sets the column name.

Parameters
  • index (int) – the 0-based row index

  • name (str) – the name of the column

set_mean(col, row, mean)

Sets the mean at this location (if valid location).

Parameters
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

  • mean (float) – the mean to set

set_row_name(index, name)

Sets the row name.

Parameters
  • index (int) – the 0-based row index

  • name (str) – the name of the row

set_stdev(col, row, stdev)

Sets the standard deviation at this location (if valid location).

Parameters
  • col (int) – the 0-based column index

  • row (int) – the 0-based row index

  • stdev (float) – the standard deviation to set

show_col(index)

Shows the column.

Parameters

index (int) – the 0-based column index

show_row(index)

Shows the row.

Parameters

index (int) – the 0-based row index

to_string_header()

Returns the header of the matrix as a string.

Returns

the header

Return type

str

to_string_key()

Returns a key for all the col names, for better readability if the names got cut off.

Returns

the key

Return type

str

to_string_matrix()

Returns the matrix as a string.

Returns

the generated output

Return type

str

to_string_ranking()

Returns the ranking in a string representation.

Returns

the ranking

Return type

str

to_string_summary()

returns the summary as string.

Returns

the summary

Return type

str

class weka.experiments.SimpleCrossValidationExperiment(datasets, classifiers, classification=True, runs=10, folds=10, result=None, class_for_ir_statistics=0, attribute_id=- 1, pred_target_column=False)

Bases: weka.experiments.SimpleExperiment

Performs a simple cross-validation experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns

producer and property path

Return type

tuple

class weka.experiments.SimpleExperiment(datasets, classifiers, jobject=None, classification=True, runs=10, result=None, class_for_ir_statistics=0, attribute_id=- 1, pred_target_column=False)

Bases: weka.core.classes.OptionHandler

Ancestor for simple experiments.

See following URL for how to use the Experiment API: http://weka.wikispaces.com/Using+the+Experiment+API

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns

producer and property path

Return type

tuple

configure_splitevaluator()

Configures and returns the SplitEvaluator and Classifier instance as tuple.

Returns

evaluator and classifier

Return type

tuple

experiment()

Returns the internal experiment, if set up, otherwise None.

Returns

the internal experiment

Return type

Experiment

classmethod load(filename)

Loads the experiment from disk.

Parameters

filename (str) – the filename of the experiment to load

Returns

the experiment

Return type

Experiment

run()

Executes the experiment.

classmethod save(filename, experiment)

Saves the experiment to disk.

Parameters
  • filename (str) – the filename to save the experiment to

  • experiment (Experiment) – the Experiment to save

setup()

Initializes the experiment.

class weka.experiments.SimpleRandomSplitExperiment(datasets, classifiers, classification=True, runs=10, percentage=66.6, preserve_order=False, result=None, class_for_ir_statistics=0, attribute_id=- 1, pred_target_column=False)

Bases: weka.experiments.SimpleExperiment

Performs a simple random split experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns

producer and property path

Return type

tuple

class weka.experiments.Tester(classname='weka.experiment.PairedCorrectedTTester', jobject=None, options=None, swap_rows_and_cols=False)

Bases: weka.core.classes.OptionHandler

For generating statistical results from an experiment.

property dataset_columns

Returns the list of column names that identify uniquely a dataset.

Returns

the list of attributes names

Return type

list

property fold_column

Returns the column name that holds the Fold number.

Returns

the attribute name

Return type

str

header(comparison_column)

Creates a “header” string describing the current resultsets.

Parameters

comparison_column (int) – the index of the column to compare against

Returns

the header

Return type

str

init_columns()

Sets the column indices based on the supplied names if necessary.

property instances

Returns the data used in the analysis.

Returns

the data in use

Return type

Instances

multi_resultset_full(base_resultset, comparison_column)

Creates a comparison table where a base resultset is compared to the other resultsets.

Parameters
  • base_resultset (int) – the 0-based index of the base resultset (eg classifier to compare against)

  • comparison_column (int) – the 0-based index of the column to compare against

Returns

the comparison

Return type

str

multi_resultset_ranking(comparison_column)

Creates a ranking.

Parameters

comparison_column (int) – the 0-based index of the column to compare against

Returns

the ranking

Return type

str

multi_resultset_summary(comparison_column)

Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.

Parameters

comparison_column (int) – the 0-based index of the column to compare against

Returns

the summary

Return type

str

property result_columns

Returns the list of column names that identify uniquely a result (eg classifier + options + ID).

Returns

the list of attribute names

Return type

list

property resultmatrix

Returns the ResultMatrix instance in use.

Returns

the matrix in use

Return type

ResultMatrix

property run_column

Returns the column name that holds the Run number.

Returns

the attribute name

Return type

str

property swap_rows_and_cols

Returns whether to swap rows/cols.

Returns

whether to swap

Return type

bool

weka.filters module

class weka.filters.AttributeSelection(jobject=None, options=None)

Bases: weka.filters.Filter

Wrapper class for weka.filters.supervised.attribute.AttributeSelection.

property evaluator

Returns the evaluator.

Returns

the evaluator in use

Return type

ASEvaluation

property search

Returns the search.

Returns

the search in use

Return type

ASSearch

class weka.filters.Filter(classname='weka.filters.AllFilter', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for filters.

batch_finished()

Signals the filter that the batch of data has finished.

Returns

True if instances can be collected from the output

Return type

bool

capabilities()

Returns the capabilities of the filter.

Returns

the capabilities

Return type

Capabilities

classmethod deserialize(ser_file)

Deserializes a filter from a file.

Parameters

ser_file (str) – the file to deserialize from

Returns

model

Return type

Filter

filter(data)

Filters the dataset(s). When providing a list, this can be used to create compatible train/test sets, since the filter only gets initialized with the first dataset and all subsequent datasets get transformed using the same setup.

NB: inputformat(Instances) must have been called beforehand.

Parameters

data (Instances or list of Instances) – the Instances to filter

Returns

the filtered Instances object(s)

Return type

Instances or list of Instances

input(inst)

Inputs the Instance.

Parameters

inst (Instance) – the instance to filter

Returns

True if filtered can be collected from output

Return type

bool

inputformat(data)

Sets the input format.

Parameters

data (Instances) – the data to use as input

classmethod make_copy(flter)

Creates a copy of the filter.

Parameters

flter (Filter) – the filter to copy

Returns

the copy of the filter

Return type

Filter

output()

Outputs the filtered Instance.

Returns

the filtered instance

Return type

an Instance object

outputformat()

Returns the output format.

Returns

the output format

Return type

Instances

serialize(ser_file)

Serializes the filter to the specified file.

Parameters

ser_file (str) – the file to save the filter to

to_source(classname, data)

Returns the model as Java source code if the classifier implements weka.filters.Sourcable.

Parameters
  • classname (str) – the classname for the generated Java code

  • data (Instances) – the dataset used for initializing the filter

Returns

the model as source code string

Return type

str

class weka.filters.MultiFilter(jobject=None, options=None)

Bases: weka.filters.Filter

Wrapper class for weka.filters.MultiFilter.

append(filter)

Appends the filter to the current list of filters.

Parameters

filter (Filter) – the filter to add

clear()

Removes all filters.

property filters

Returns the list of base filters.

Returns

the filter list

Return type

list

class weka.filters.StringToWordVector(jobject=None, options=None)

Bases: weka.filters.Filter

Wrapper class for weka.filters.unsupervised.attribute.StringToWordVector.

property stemmer

Returns the stemmer.

Returns

the stemmer

Return type

Stemmer

property stopwords

Returns the stopwords handler.

Returns

the stopwords handler

Return type

Stopwords

property tokenizer

Returns the tokenizer.

Returns

the tokenizer

Return type

Tokenizer

weka.filters.main(args=None)

Runs a filter from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters

args (list) – the command-line arguments to use, uses sys.argv if None

weka.filters.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns

0 for success, 1 for failure.

Return type

int

weka.timeseries module

class weka.timeseries.ConfidenceIntervalForecaster(jobject)

Bases: weka.core.classes.JavaObject

Wrapper class for ConfidenceIntervalForecaster objects.

property calculate_conf_intervals_for_forecasts

Returns the number of steps for which confidence intervals will be computed.

Returns

the steps

Return type

int

property confidence_level

Returns the confidence level in use for computing confidence intervals.

Returns

the level

Return type

float

property is_producing_confidence_intervals

Returns true if this forecaster is computing confidence limits for some or all of its future forecasts (i.e. getCalculateConfIntervalsForForecasts() > 0).

Returns

true if confidence intervals are produced

Return type

bool

class weka.timeseries.CustomPeriodicTest(jobject=None, test=None)

Bases: weka.core.classes.JavaObject

Class that evaluates a supplied date against user-specified date constant fields. Fields that can be tested against include year, month, week of year, week of month, day of year, day of month, day of week, hour of day, minute of hour and second. Wildcard “*” matches any value for a particular field. Each CustomPeriodicTest is made up of one or two test parts. If the first test part’s operator is “=”, then no second part is necessary. Otherwise the first test part may use > or >= operators and the second test part < or <= operators. Taken together, the two parts define an interval. An optional label may be associated with the interval.

evaluate(date)

Evaluate the supplied date with respect to this custom periodic test interval.

Parameters

date (Date) – the date to test

Returns

true if the date lies within the interval.

Return type

bool

property label

Returns the label.

Returns

the label

Return type

str

lower_test()

Returns the lower bound test.

Returns

the test

Return type

TestPart

test(test)

Sets the test as string.

Parameters

test (str) – the test to use

upper_test()

Returns the upper bound test.

Returns

the test

Return type

TestPart

class weka.timeseries.ErrorModule(jobject)

Bases: weka.timeseries.TSEvalModule

Wrapper for ErrorModule objects.

counts_for_targets()

Returns the number of predicted, actual pairs for each target. Only entries that are non-missing for both actual and predicted contribute to the overall count.

Returns

the number of predicted, actual pairs for each target.

Return type

ndarray

errors_for_target(target)

Returns the list of the errors for the supplied target.

Parameters

target (str) – the target to get the errors for

Returns

the errors

Return type

list

predictions_for_all_targets()

Returns the list of predictions for all targets.

Returns

list of list of NumericPrediction

Return type

list

predictions_for_target(target)

Returns the list of predictions for the target.

Parameters

target (str) – the target to get the predictions for

Returns

list of NumericPrediction

Return type

list

class weka.timeseries.IncrementallyPrimeable(jobject)

Bases: weka.core.classes.JavaObject

Wrapper class for IncrementallyPrimeable objects.

prime_forecaster_incremental(inst)

Primes the forecaster using the provided data.

Parameters

inst (Instance) – the instance to prime with

class weka.timeseries.OverlayForecaster(jobject)

Bases: weka.core.classes.JavaObject

Wrapper class for OverlayForecaster objects.

forecast_with_overlays(steps, overlays)

Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated. Also assumes that the forecaster has been told which attributes are to be considered “overlay” attributes in the data. Overlay data is data that the forecaster will be provided with when making a forecast into the future - i.e. it will be given the values of these attributes for future instances. The overlay data provided to this method should have the same structure as the original data used to train the forecaster - i.e. all original fields should be present, including the targets and time stamp field (if supplied). The values of targets will of course be missing (‘?’) since we want to forecast those. The time stamp values (if a time stamp is in use) may be provided, in which case the forecaster will use the time stamp values in the overlay instances. If the time stamp values are missing, then date arithmetic (for date time stamps) will be used to advance the time value beyond the last seen training value; similarly, for artificial time stamps or non-date time stamps, the computed time delta will be used to increment beyond the last seen training value.

The number of instances in the overlay data should typically match the number of steps that have been requested for forecasting. If these differ, then overlay.numInstances() will be the number of steps forecasted.

Parameters
  • steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.

  • overlays (Instances) – the overlay data to use

Returns

a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)

Return type

list

property is_using_overlay_data

Returns true if overlay data has been used to train this forecaster, and thus is expected to be supplied for future time steps when making a forecast.

Returns

property overlay_fields

Returns the overlay fields as string.

Returns

the overlay fields

Return type

str

class weka.timeseries.Periodicity(jobject=None, periodicity=None)

Bases: weka.core.classes.Enum

Defines periodicity.

class weka.timeseries.PeriodicityHandler(jobject)

Bases: weka.core.classes.JavaObject

Helper class to manage time stamp manipulation with respect to various periodicities. Has a routine to remap the time stamp, which is useful for date time stamps. Since dates are just manipulated internally as the number of milliseconds elapsed since the epoch, and any global trend modelling in regression functions results in enormous coefficients for this variable - remapping to a more reasonable scale prevents this. It also makes it easier to handle the case where there are time periods that shouldn’t be considered as a time unit increment, e.g. weekends and public holidays for financial trading data. These “holes” in the data can be accomodated by accumulating a negative offset for the remapped date when a particular data/time occurs in a user-specified “skip” list.

property delta_time

Returns the delta time.

Returns

the delta time

Return type

float

class weka.timeseries.TSEvalModule(jobject)

Bases: weka.core.classes.JavaObject

Wrapper for TSEvalModule objects.

calculate_measure()

Calculate the measure that this module represents.

Returns

the value of the measure for this module for each of the target(s).

Return type

ndarray

property definition

Returns the description.

property description

Returns the description.

property eval_name

Returns the name.

evaluate_for_instance(pred, inst)

Evaluate the given forecast(s) with respect to the given test instance. Targets with missing values are ignored.

Parameters
classmethod module(name)

Returns the module with the specified name.

Parameters

name (str) – the name of the module to return

Returns

the TSEvalModule object

Return type

TSEvalModule

classmethod module_list()

Returns list of available modules.

Returns

the list of modules (TSEvalModule objects)

Return type

list

reset()

Resets the module.

property summary

Returns the description.

property target_fields

Returns the list of target fields.

Returns

the list of target fields

Return type

list

class weka.timeseries.TSEvaluation(train, test_split_size=0.3, test=None)

Bases: weka.core.classes.JavaObject

Evaluation class for timeseries forecasters.

evaluate(forecaster, build_model=True)

Evaluates the forecaster.

Parameters
  • forecaster (TSForecaster) – the forecaster to evaluate

  • build_model (bool) – whether to build the model as well

classmethod evaluate_forecaster(forecaster, args)

Evaluates the forecaster with the given options.

Parameters
  • forecaster (TSForecaster) – the forecaster instance to use

  • args (list) – the command-line arguments to use

property evaluate_on_test_data

Returns whether to evaluate on the test data.

Returns

whether to evaluate

Return type

bool

property evaluate_on_training_data

Returns whether to evaluate on the training data.

Returns

whether to evaluate

Return type

bool

property evaluation_modules

Returns the list of evaluation modules in use.

Returns

list of TSEvalModule object

Return type

list

property forecast_future

Returns whether we should generate a future forecast beyond the end of the training and/or test data.

Returns

whether to prime

Return type

bool

property horizon

Returns the number of steps to predict into the future.

Returns

the number of steps

Return type

int

predictions_for_test_data(step_number)

Predictions for all targets for the specified step number on the test data.

Parameters

step_number (int) – number of the step into the future to return predictions for

predictions_for_training_data(step_number)

Predictions for all targets for the specified step number on the training data.

Parameters

step_number (int) – number of the step into the future to return predictions for

property prime_for_test_data_with_test_data

Returns whether evaluation for test data should begin by priming with the first x test data instances and then forecasting from step x + 1. This is the only option if there is no training data and a model has been deserialized from disk. If we have training data, and it occurs immediately before the test data in time, then we can prime with the last x instances from the training data.

Returns

whether to prime

Return type

bool

property prime_window_size

Returns the size of the priming window, ie the number of historical instances to present before making a forecast.

Returns

the size

Return type

int

print_future_forecast_on_test_data(forecaster)

Print the forecasted values (for all targets) beyond the end of the test data.

Parameters

forecaster (TSForecaster) – the forecaster to use

Returns

the forecasted values

Return type

str

print_future_forecast_on_training_data(forecaster)

Print the forecasted values (for all targets) beyond the end of the training data.

Parameters

forecaster (TSForecaster) – the forecaster to use

Returns

the forecasted values

Return type

str

print_predictions_for_test_data(title, target_name, step_ahead, instance_number_offset=0)

Print the predictions for a given target at a given step-ahead level on the test data.

Parameters
  • title (str) – the title for the output

  • target_name (str) – the name of the target to print predictions for

  • step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions

  • instance_number_offset (int) – the offset from the start of the test data from which to print actual and predicted values

Returns

the predicted/actual values

Return type

str

print_predictions_for_training_data(title, target_name, step_ahead, instance_number_offset=0)

Print the predictions for a given target at a given step-ahead level on the training data.

Parameters
  • title (str) – the title for the output

  • target_name (str) – the name of the target to print predictions for

  • step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions

  • instance_number_offset (int) – the offset from the start of the training data from which to print actual and predicted values

Returns

the predicted/actual values

Return type

str

property rebuild_model_after_each_test_forecast_step

Returns whether the forecasting model should be rebuilt after each forecasting step on the test data using both the training data and test data up to the current instance.

Returns

whether to rebuild

Return type

bool

summary()

Generates a summary.

Returns

the summary

Return type

str

property test_data

Returns the test data.

Returns

the test data, None if none available

Return type

Instances

property training_data

Returns the training data.

Returns

the training data, None if none available

Return type

Instances

class weka.timeseries.TSForecaster(classname='weka.classifiers.timeseries.WekaForecaster', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for timeseries forecasters.

property algorithm_name

Returns the name of the algorithm.

Returns

the name

Return type

str

property base_model_has_serializer

Check whether the base learner requires special serialization.

Returns

True if base learner requires special serialization, false otherwise

Return type

bool

build_forecaster(data)

Builds the forecaster using the provided data.

Parameters

data (Instances) – the data to train with

clear_previous_state()

Reset model state.

property fields_to_forecast

Returns the fields to forecast.

Returns

the fields

Return type

str

forecast(steps)

Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated.

Parameters

steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.

Returns

a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)

Return type

list

property header

Returns the header of the training data.

Returns

the structure of the training data, None if not available

Return type

Instances

load_base_model(fname)

Loads the base model from the given filename.

Parameters

fname (str) – the file to load the base model from

load_serialized_state(fname)

Loads the serialized state from the given filename.

Parameters

fname (str) – the file to deserialize the state from

property previous_state

Returns the previous state.

Returns

the state as list of JB_Object objects

Return type

list

prime_forecaster(data)

Primes the forecaster using the provided data.

Parameters

data (Instances) – the data to prime with

reset()

Resets the algorithm.

run_forecaster(forecaster, options)

Builds the forecaster using the provided data.

save_base_model(fname)

Saves the base model under the given filename.

Parameters

fname (str) – the file to save the base model under

serialize_state(fname)

Serializes the state under the given filename.

Parameters

fname (str) – the file to serialize the state under

property uses_state

Check whether the base learner requires operations regarding state.

Returns

True if base learner uses state-based predictions, false otherwise

Return type

bool

class weka.timeseries.TSLagMaker(jobject=None, options=None)

Bases: weka.filters.Filter

A class for creating lagged versions of target variable(s) for use in time series forecasting. Uses the TimeseriesTranslate filter. Has options for creating averages of consecutive lagged variables (which can be useful for long lagged variables). Some polynomials of time are also created (if there is a time stamp), such as time^2 and time^3. Also creates cross products between time and the lagged and averaged lagged variables. If there is no date time stamp in the data then the user has the option of having an artificial time stamp created. Time stamps, real or otherwise, are used for modeling trends rather than using a differencing-based approach.

Also has routines for dealing with a date timestamp - i.e. it can detect a monthly time period (because months are different lengths) and maps date time stamps to equal spaced time intervals. For example, in general, a date time stamp is remapped by subtracting the first observed value and adding this value divided by the constant delta (difference between consecutive steps) to the result. In the case of a detected monthly time period, the remapping involves subtracting the base year and then adding to this the number of the month within the current year plus twelve times the number of intervening years since the base year.

Also has routines for adding new attributes derived from a date time stamp to the data - e.g. AM indicator, day of the week, month, quarter etc. In the case where there is no real date time stamp, the user may specify a nominal periodic variable (if one exists in the data). For example, month might be coded as a nominal value. In this case it can be specified as the primary periodic variable. The point is, that in all these cases (nominal periodic and date-derived periodics), we are able to determine what the value of these variables will be in future instances (as computed from the last known historic instance).

property add_am_indicator

Returns whether to add an AM indicator.

Returns

true if to add

Return type

bool

add_custom_periodic(periodic)

Adds the custom periodic.

Parameters

periodic (str) – the periodic to add

property add_day_of_month

Returns whether to add day of month attribute.

Returns

true if to add

Return type

bool

property add_day_of_week

Returns whether to add day of week attribute.

Returns

true if to add

Return type

bool

property add_month_of_year

Returns whether to add month of year attribute.

Returns

true if to add

Return type

bool

property add_num_days_in_month

Returns whether to add # of days in month attribute.

Returns

true if to add

Return type

bool

property add_quarter_of_year

Returns whether to add quarter of year attribute.

Returns

true if to add

Return type

bool

property add_weekend_indicator

Returns whether to add a weekend indicator.

Returns

true if to add

Return type

bool

Returns true if we are adjusting for trends via a real or artificial time stamp.

Returns

true if to adjust

Return type

bool

property adjust_for_variance

Returns true if we are adjusting for variance by taking the log of the target(s).

Returns

true if to adjust

Return type

bool

property artificial_time_start_value

Returns the current value of the artificial time stamp. After training, after priming, and prior to forecasting, this will be equal to the number of training instances seen.

Returns

the start

Return type

float

property average_consecutive_long_lags

Returns true if consecutive long lagged variables are to be averaged.

Returns

true if to average

Return type

bool

property average_lags_after

Returns the point after which long lagged variables will be averaged.

Returns

the lag

Return type

int

clear_custom_periodics()

Clears the custom periodics.

clear_lag_histories()

Clears any history accumulated in the lag creating filters.

create_time_lag_cross_products(data)

Creates the cross-products.

Parameters

data (Instances) – the data to create the cross-products for

Returns

the cross-products

Return type

Instances

property current_timestamp_value

Returns the current (i.e. most recent) time stamp value. Unlike an artificial time stamp, the value after training, after priming and before forecasting, will be equal to the time stamp of the most recent priming instance.

Returns

the timestamp value

Return type

float

property delta_time

Returns the difference between time values. This may be only approximate for periods based on dates. It is best to used date-based arithmetic in this case for incrementing/decrementing time stamps.

Returns

the delta

Return type

float

property fields_to_lag

Returns the fields to lag as list.

Returns

the fields to lag

Return type

list

property fields_to_lag_as_string

Returns the fields to lag as string.

Returns

the fields to lag

Return type

str

property include_powers_of_time

Returns whether to include powers of time in the transformed data.

Returns

true if to include

Return type

bool

property include_timelag_products

Returns whether to include products between time and the lagged variables.

Returns

true if to include

Return type

bool

increment_artificial_time_value(increment)

Increment the artificial time value with the supplied increment value.

Parameters

increment (int) – the increment

property is_using_artificial_time_index

Returns whether an artifical time index is used.

Returns

true if to add

Return type

bool

property lag_range

Returns the lag range to create.

Returns

the lag range

Return type

str

property max_lag

Returns the maximum lag to create.

Returns

the lag

Return type

int

property min_lag

Returns the minimum lag to create.

Returns

the lag

Return type

int

property num_consecutive_long_lags_to_average

Returns the number of consecutive long lagged variables to average.

Returns

the lag

Return type

int

property overlay_fields

Returns the overlay fields as list.

Returns

the overlay fields

Return type

list

property periodicity

Returns the Periodicity representing the time stamp in use for this lag maker. If the lag maker is not adjusting for trends, or an artificial time stamp is being used, then null is returned.

Returns

the periodicity

Return type

Periodicity

property primary_periodic_field_name

Returns the name of the primary periodic attribute or null if one hasn’t been specified.

Returns

the name

Return type

str

property remove_leading_instances_with_unknown_lag_values

Returns whether to remove instances with unknown lag values.

Returns

true if to remove

Return type

bool

property skip_entries

Returns a list of time units to be ‘skipped’ - i.e. not considered as an increment. E.g financial markets don’t trade on the weekend, so the difference between friday closing and the following monday closing is one time unit (and not three). Can accept strings such as “sat”, “sunday”, “jan”, “august”, or explicit dates (with optional formatting string) such as “2011-07-04@yyyy-MM-dd”, or integers. Integers are interpreted with respect to the periodicity - e.g for daily data they are interpreted as day of the year; for hourly data, hour of the day; weekly data, week of the year.

Returns

the lag range

Return type

str

property timestamp_field

Returns the overlay fields as list.

Returns

the overlay fields

Return type

list

transformed_data(data)

Returns the transformed data.

Parameters

data (Instances) – the data to transform

Returns

the transformed data

Return type

Instances

class weka.timeseries.TSLagUser(jobject)

Bases: weka.core.classes.JavaObject

Wrapper class for TSLagUser objects.

property tslag_maker

Returns the base forecaster.

Returns

the base forecaster

Return type

Classifier

class weka.timeseries.TestPart(jobject)

Bases: weka.core.classes.JavaObject

Inner class defining one boundary of an interval.

day()

Returns the day string.

Returns

the day string

Return type

str

day_of_month(s)

Sets the day of the month.

Parameters

s (str) – the dom to use

day_of_week(s)

Sets the day of the week.

Parameters

s (str) – the dow to use

day_of_year(s)

Sets the day of year.

Parameters

s (str) – the doy to use

eval(date, other)

Evaluate the supplied date against this bound. Handles date fields that are cyclic (such as month, day of week etc.) so that intervals such as oct < date < mar evaluate correctly.

Parameters
  • date (Date) – the date to test

  • other (TestPart) – the other bound

Returns

true if the supplied date is within this bound

Return type

bool

hour_of_day(s)

Sets the hour of the day.

Parameters

s (str) – the hod to use

property is_upper

Returns true if this is the upper bound.

Returns

true if upper bound

Return type

bool

minute_of_hour(s)

Sets the minute of the hour.

Parameters

s (str) – the moh to use

property month

Returns the month string.

Returns

the month string

Return type

str

operator(s)

Sets the operator.

Parameters

s (str) – the operator to use

second(s)

Sets the second.

Parameters

s (str) – the second to use

week_of_month(s)

Sets the week of the month.

Parameters

s (str) – the wom to use

week_of_year(s)

Sets the week of the year.

Parameters

s (str) – the woy to use

year(s)

Sets the year.

Parameters

s (str) – the year to use

class weka.timeseries.WekaForecaster(jobject=None, options=None)

Bases: weka.timeseries.TSForecaster, weka.timeseries.TSLagUser, weka.timeseries.ConfidenceIntervalForecaster, weka.timeseries.OverlayForecaster, weka.timeseries.IncrementallyPrimeable

Wrapper class for Weka timeseries forecasters.

add_custom_periodic(periodic)

Adds the custom periodic.

Parameters

periodic (str) – the periodic to add

property base_forecaster

Returns the base forecaster.

Returns

the base forecaster

Return type

Classifier

clear_custom_periodics()

Clears the custom periodics.