weka package¶

Subpackages¶

weka.associations module¶

class weka.associations.AssociationRule(jobject)¶

Bases: JavaObject

Wrapper for weka.associations.AssociationRule class.

property consequence¶

Get the consequence.

Returns:: the consequence, list of Item objects
Return type:: list

property consequence_support¶

Get the support for the consequence.

Returns:: the support
Return type:: int

property metric_names¶

Returns the metric names for the rule.

Returns:: the metric names
Return type:: list

metric_value(name)¶

Returns the named metric value for the rule.

Parameters:: name (str) – the name of the metric
Returns:: the metric value
Return type:: float

property metric_values¶

Returns the metric values for the rule.

Returns:: the metric values
Return type:: ndarray

property premise¶

Get the premise.

Returns:: the premise, list of Item objects
Return type:: list

property premise_support¶

Get the support for the premise.

Returns:: the support
Return type:: int

property primary_metric_name¶

Returns the primary metric name for the rule.

Returns:: the metric name
Return type:: str

property primary_metric_value¶

Returns the primary metric value for the rule.

Returns:: the metric value
Return type:: float

to_dict()¶

Builds a dictionary with the properties of the AssociationRule object.

Returns:: the AssociationRule dictionary
Return type:: dict

property total_support¶

Get the total support.

Returns:: the support
Return type:: int

property total_transactions¶

Get the total transactions.

Returns:: the transactions
Return type:: int

class weka.associations.AssociationRules(jobject)¶

Bases: JavaObject

Wrapper for weka.associations.AssociationRules class.

property producer¶

Returns a string describing the producer that generated these rules.

Returns:: the producer
Return type:: str

to_dict()¶

Returns a list of association rules in dict format

Returns:: the association rules
Return type:: list

class weka.associations.AssociationRulesIterator(rules)¶

Bases: object

Iterator for weka.associations.AssociationRules class.

class weka.associations.Associator(classname=None, jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for associators.

association_rules()¶

Returns association rules that were generated. Only if implements AssociationRulesProducer.

Returns:: the association rules that were generated
Return type:: AssociationRules

build_associations(data)¶

Builds the associator with the data.

Parameters:: data (Instances) – the data to train the associator with

can_produce_rules()¶

Checks whether association rules can be generated.

Returns:: whether scheme implements AssociationRulesProducer interface and association rules can be generated
Return type:: bool

property capabilities¶

Returns the capabilities of the associator.

Returns:: the capabilities
Return type:: Capabilities

property header¶

Returns the header of the training data.

Returns:: the structure of the training data, None if not available
Return type:: Instances

classmethod make_copy(associator)¶

Creates a copy of the associator.

Parameters:: associator (Associator) – the associator to copy
Returns:: the copy of the associator
Return type:: Associator

property rule_metric_names¶

Returns the rule metric names of the association rules. Only if implements AssociationRulesProducer.

Returns:: the metric names
Return type:: list

class weka.associations.Item(jobject)¶

Bases: JavaObject

Wrapper for weka.associations.Item class.

property attribute¶

Returns the attribute.

Returns:: the attribute
Return type:: Attribute

property comparison¶

Returns the comparison operator as string.

Returns:: the comparison iterator
Return type:: str

decrease_frequency(frequency=None)¶

Decreases the frequency.

Parameters:: frequency (int) – the frequency to decrease by, 1 if None

property frequency¶

Returns the frequency.

Returns:: the frequency
Return type:: int

increase_frequency(frequency=None)¶

Increases the frequency.

Parameters:: frequency (int) – the frequency to increase by, 1 if None

property item_value¶

Returns the item value as string.

Returns:: the item value
Return type:: str

weka.associations.main(args=None)¶

Runs an associator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:: args (list) – the command-line arguments to use, uses sys.argv if None

weka.associations.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:: 0 for success, 1 for failure.
Return type:: int

weka.attribute_selection module¶

class weka.attribute_selection.ASEvaluation(classname='weka.attributeSelection.CfsSubsetEval', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for attribute selection evaluation algorithm.

build_evaluator(data)¶

Builds the evaluator with the data.

Parameters:: data (Instances) – the data to use

property capabilities¶

Returns the capabilities of the classifier.

Returns:: the capabilities
Return type:: Capabilities

convert_instance(inst)¶

Transforms an instance in the format of the original data to the transformed space.

Parameters:: inst (Instance) – the Instance to transform
Returns:: the transformed instance
Return type:: Instance

property header¶

Returns the header of the training data.

Returns:: the structure of the training data, None if not available
Return type:: Instances

post_process(indices)¶

Post-processes the evaluator with the selected attribute indices.

Parameters:: indices (ndarray) – the attribute indices list to use
Returns:: the processed indices
Return type:: ndarray

transformed_data(data)¶

Transform the supplied data set (assumed to be the same format as the training data).

Parameters:: data (Instances) – the data to transform
Returns:: the transformed data
Return type:: Instances

transformed_header()¶

Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through transformed_data(). Returns None if not a weka.attributeSelection.AttributeTransformer

Returns:: the header
Return type:: Instances

class weka.attribute_selection.ASSearch(classname='weka.attributeSelection.BestFirst', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for attribute selection search algorithm.

property header¶

Returns the header of the training data.

Returns:: the structure of the training data, None if not available
Return type:: Instances

search(evaluation, data)¶

Performs the search and returns the indices of the selected attributes.

Parameters:

evaluation (ASEvaluation) – the evaluation algorithm to use
data (Instances) – the data to use

Returns:

the selected attributes (0-based indices)

Return type:

ndarray

class weka.attribute_selection.AttributeSelection¶

Bases: JavaObject

Performs attribute selection using search and evaluation algorithms.

classmethod attribute_selection(evaluator, args)¶

Performs attribute selection using the given attribute evaluator and options.

Parameters:

evaluator (ASEvaluation) – the evaluator to use
args (list) – the command-line args for the attribute selection

Returns:

the results string

Return type:

str

crossvalidation(crossvalidation)¶

Sets whether to perform cross-validation.

Parameters:: crossvalidation (bool) – whether to perform cross-validation

property cv_results¶

Generates a results string from the last cross-validation attribute selection.

Returns:: the results string
Return type:: str

evaluator(evaluator)¶

Sets the evaluator to use.

Parameters:: evaluator (ASEvaluation) – the evaluator to use.

folds(folds)¶

Sets the number of folds to use for cross-validation.

Parameters:: folds (int) – the number of folds

property number_attributes_selected¶

Returns the number of attributes that were selected.

Returns:: the number of attributes
Return type:: int

property rank_results¶

Returns the results from the cross-validation for rankers.

Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.

Returns:: the dictionary of results (mean and stdev for rank and merit)
Return type:: dict

property ranked_attributes¶

Returns the matrix of ranked attributes from the last run.

Returns:: the Numpy matrix
Return type:: ndarray

ranking(ranking)¶

Sets whether to perform a ranking, if possible.

Parameters:: ranking (bool) – whether to perform a ranking

reduce_dimensionality(data)¶

Reduces the dimensionality of the provided Instance or Instances object.

Parameters:: data (Instances) – the data to process
Returns:: the reduced dataset
Return type:: Instances

property results_string¶

Generates a results string from the last attribute selection.

Returns:: the results string
Return type:: str

search(search)¶

Sets the search algorithm to use.

Parameters:: search (ASSearch) – the search algorithm

seed(seed)¶

Sets the seed for cross-validation.

Parameters:: seed (int) – the seed value

select_attributes(instances)¶

Performs attribute selection on the given dataset.

Parameters:: instances (Instances) – the data to process

select_attributes_cv_split(instances)¶

Performs attribute selection on the given cross-validation split.

Parameters:: instances (Instances) – the data to process

property selected_attributes¶

Returns the selected attributes from the last run.

Returns:: the Numpy array of 0-based indices
Return type:: ndarray

property subset_results¶

Returns the results from the cross-validation subsets, i.e., how often an attribute was selected.

Unfortunately, the Weka API does not give direct access to underlying data structures, hence we have to parse the textual output.

Returns:: the list of results (double)
Return type:: list

weka.attribute_selection.main(args=None)¶

Runs attribute selection from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:: args (list) – the command-line arguments to use, uses sys.argv if None

weka.attribute_selection.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:: 0 for success, 1 for failure.
Return type:: int

weka.classifiers module¶

class weka.classifiers.AttributeSelectedClassifier(jobject=None, options=None)¶

Bases: SingleClassifierEnhancer

Wrapper class for the AttributeSelectedClassifier.

property evaluator¶

Returns the evaluator.

Returns:: the evaluator in use
Return type:: ASEvaluation

property search¶

Returns the search.

Returns:: the search in use
Return type:: ASSearch

class weka.classifiers.Classifier(classname='weka.classifiers.rules.ZeroR', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for classifiers.

additional_measure(measure)¶

Returns the specified additional measure if implementing weka.core.AdditionalMeasureProducer, otherwise None.

Parameters:: measure (str) – the measure to retrieve
Returns:: the additional measure
Return type:: str

property additional_measures¶

Returns the list of additional measures if implementing weka.core.AdditionalMeasureProducer, otherwise None.

Returns:: the additional measures
Return type:: str

property batch_size¶

Returns the batch size, in case this classifier is a batch predictor.

Returns:: the batch size, None if not a batch predictor
Return type:: str

build_classifier(data)¶

Builds the classifier with the data.

Parameters:: data (Instances) – the data to train the classifier with

property capabilities¶

Returns the capabilities of the classifier.

Returns:: the capabilities
Return type:: Capabilities

classify_instance(inst)¶

Peforms a prediction.

Parameters:: inst (Instance) – the Instance to get a prediction for
Returns:: the classification (either regression value or 0-based label index)
Return type:: float

classmethod deserialize(ser_file)¶

Deserializes a classifier from a file.

Parameters:: ser_file (str) – the model file to deserialize
Returns:: model and, if available, the dataset header
Return type:: tuple

distribution_for_instance(inst)¶

Peforms a prediction, returning the class distribution.

Parameters:: inst (Instance) – the Instance to get the class distribution for
Returns:: the class distribution array
Return type:: ndarray

distributions_for_instances(data)¶

Peforms predictions, returning the class distributions.

Parameters:: data (Instances) – the Instances to get the class distributions for
Returns:: the class distribution matrix, None if not a batch predictor
Return type:: ndarray

property graph¶

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:: the generated graph string
Return type:: str

property graph_type¶

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:: the type
Return type:: int

has_efficient_batch_prediction()¶

Returns whether the classifier implements a more efficient batch prediction.

Returns:: True if a more efficient batch prediction is implemented, always False if not batch predictor
Return type:: bool

property header¶

Returns the header of the training data.

Returns:: the structure of the training data, None if not available
Return type:: Instances

classmethod make_copy(classifier)¶

Creates a copy of the classifier.

Parameters:: classifier (Classifier) – the classifier to copy
Returns:: the copy of the classifier
Return type:: Classifier

serialize(ser_file, header=None)¶

Serializes the classifier to the specified file.

Parameters:

ser_file (str) – the file to save the model to
header (Instances) – the (optional) dataset header to store alongside; recommended

to_source(classname)¶

Returns the model as Java source code if the classifier implements weka.classifiers.Sourcable.

Parameters:: classname (str) – the classname for the generated Java code
Returns:: the model as source code string
Return type:: str

update_classifier(inst)¶

Updates the classifier with the instance.

Parameters:: inst (Instance) – the Instance to update the classifier with

class weka.classifiers.CostMatrix(matrx=None, num_classes=None)¶

Bases: JavaObject

Class for storing and manipulating a misclassification cost matrix. The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i. Cost values can be fixed or computed on a per-instance basis (cost sensitive evaluation only) from the value of an attribute or an expression involving attribute(s).

apply_cost_matrix(data, rnd)¶

Applies the cost matrix to the data.

Parameters:

data (Instances) – the data to apply to
rnd (Random) – the random number generator

expected_costs(class_probs, inst=None)¶

Calculates the expected misclassification cost for each possible class value, given class probability estimates.

Parameters:: class_probs (ndarray) – the class probabilities
Returns:: the calculated costs
Return type:: ndarray

get_cell(row, col)¶

Returns the JPype object at the specified location.

Parameters:

row (int) – the 0-based index of the row
col (int) – the 0-based index of the column

Returns:

the object in that cell

Return type:

JPype object

get_element(row, col, inst=None)¶

Returns the value at the specified location.

Parameters:

row (int) – the 0-based index of the row
col (int) – the 0-based index of the column
inst (Instance) – the Instace

Returns:

the value in that cell

Return type:

float

get_max_cost(class_value, inst=None)¶

Gets the maximum cost for a particular class value.

Parameters:

class_value (int) – the class value to get the maximum cost for
inst (Instance) – the Instance

Returns:

the cost

Return type:

float

initialize()¶: Initializes the matrix.

normalize()¶: Normalizes the matrix.

property num_columns¶

Returns the number of columns.

Returns:: the number of columns
Return type:: int

property num_rows¶

Returns the number of rows.

Returns:: the number of rows
Return type:: int

classmethod parse_matlab(matlab)¶

Parses the costmatrix definition in matlab format and returns a matrix.

Parameters:: matlab (str) – the matlab matrix string, eg [1 2; 3 4].
Returns:: the generated matrix
Return type:: CostMatrix

set_cell(row, col, obj)¶

Sets the JPype object at the specified location. Automatically unwraps JavaObject.

Parameters:

row (int) – the 0-based index of the row
col (int) – the 0-based index of the column
obj (object) – the object for that cell

set_element(row, col, value)¶

Sets the float value at the specified location.

Parameters:

row (int) – the 0-based index of the row
col (int) – the 0-based index of the column
value (float) – the float value for that cell

property size¶

Returns the number of rows/columns.

Returns:: the number of rows/columns
Return type:: int

to_matlab()¶

Returns the matrix in Matlab format.

Returns:: the matrix as Matlab formatted string
Return type:: str

class weka.classifiers.Evaluation(data, cost_matrix=None)¶

Bases: JavaObject

Evaluation class for classifiers.

area_under_prc(class_index)¶

Returns the area under precision recall curve.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the area
Return type:: float

area_under_roc(class_index)¶

Returns the area under receiver operators characteristics curve.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the area
Return type:: float

property avg_cost¶

Returns the average cost.

Returns:: the cost
Return type:: float

class_details(title=None)¶

Generates the class details.

Parameters:: title (str) – optional title
Returns:: the details
Return type:: str

property class_priors¶

Returns the class priors.

Returns:: the priors
Return type:: ndarray

property confusion_matrix¶

Returns the confusion matrix.

Returns:: the matrix
Return type:: ndarray

property correct¶

Returns the correct count (nominal classes).

Returns:: the count
Return type:: float

property correlation_coefficient¶

Returns the correlation coefficient (numeric classes).

Returns:: the coefficient
Return type:: float

property coverage_of_test_cases_by_predicted_regions¶

Returns the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.

Returns:: the coverage
Return type:: float

crossvalidate_model(classifier, data, num_folds, rnd, output=None)¶

Crossvalidates the model using the specified data, number of folds and random number generator wrapper.

Parameters:

classifier (Classifier) – the classifier to cross-validate
data (Instances) – the data to evaluate on
num_folds (int) – the number of folds
rnd (Random) – the random number generator to use
output (PredictionOutput) – the output generator to use

cumulative_margin_distribution()¶

Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.

Returns:: the cumulative margin distribution
Return type:: str

property discard_predictions¶

Returns whether to discard predictions (saves memory).

Returns:: True if to discard
Return type:: bool

property error_rate¶

Returns the error rate (numeric classes).

Returns:: the rate
Return type:: float

classmethod evaluate_model(classifier, args)¶

Evaluates the classifier with the given options.

Parameters:

classifier (Classifier) – the classifier instance to use
args (list) – the command-line arguments to use

Returns:

the evaluation string

Return type:

str

evaluate_train_test_split(classifier, data, percentage, rnd=None, output=None)¶

Splits the data into train and test, builds the classifier with the training data and evaluates it against the test set.

Parameters:

classifier (Classifier) – the classifier to cross-validate
data (Instances) – the data to evaluate on
percentage (double) – the percentage split to use (amount to use for training)
rnd (Random) – the random number generator to use, if None the order gets preserved
output (PredictionOutput) – the output generator to use

f_measure(class_index)¶

Returns the f measure.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the measure
Return type:: float

false_negative_rate(class_index)¶

Returns the false negative rate.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the rate
Return type:: float

false_positive_rate(class_index)¶

Returns the false positive rate.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the rate
Return type:: float

property header¶

Returns the header format.

Returns:: the header format
Return type:: Instances

property incorrect¶

Returns the incorrect count (nominal classes).

Returns:: the count
Return type:: float

property kappa¶

Returns kappa.

Returns:: kappa
Return type:: float

property kb_information¶

Returns KB information.

Returns:: the information
Return type:: float

property kb_mean_information¶

Returns KB mean information.

Returns:: the information
Return type:: float

property kb_relative_information¶

Returns KB relative information.

Returns:: the information
Return type:: float

matrix(title=None)¶

Generates the confusion matrix.

Parameters:: title (str) – optional title
Returns:: the matrix
Return type:: str

matthews_correlation_coefficient(class_index)¶

Returns the Matthews correlation coefficient (nominal classes).

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the coefficient
Return type:: float

property mean_absolute_error¶

Returns the mean absolute error.

Returns:: the error
Return type:: float

property mean_prior_absolute_error¶

Returns the mean prior absolute error.

Returns:: the error
Return type:: float

num_false_negatives(class_index)¶

Returns the number of false negatives.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the count
Return type:: float

num_false_positives(class_index)¶

Returns the number of false positives.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the count
Return type:: float

property num_instances¶

Returns the number of instances that had a known class value.

Returns:: the number of instances
Return type:: float

num_true_negatives(class_index)¶

Returns the number of true negatives.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the count
Return type:: float

num_true_positives(class_index)¶

Returns the number of true positives.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the count
Return type:: float

property percent_correct¶

Returns the percent correct (nominal classes).

Returns:: the percentage
Return type:: float

property percent_incorrect¶

Returns the percent incorrect (nominal classes).

Returns:: the percentage
Return type:: float

property percent_unclassified¶

Returns the percent unclassified.

Returns:: the percentage
Return type:: float

precision(class_index)¶

Returns the precision.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the precision
Return type:: float

property predictions¶

Returns the predictions.

Returns:: the predictions. None if not available
Return type:: list

recall(class_index)¶

Returns the recall.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the recall
Return type:: float

property relative_absolute_error¶

Returns the relative absolute error.

Returns:: the error
Return type:: float

property root_mean_prior_squared_error¶

Returns the root mean prior squared error.

Returns:: the error
Return type:: float

property root_mean_squared_error¶

Returns the root mean squared error.

Returns:: the error
Return type:: float

property root_relative_squared_error¶

Returns the root relative squared error.

Returns:: the error
Return type:: float

property sf_entropy_gain¶

Returns the total SF, which is the null model entropy minus the scheme entropy.

Returns:: the gain
Return type:: float

property sf_mean_entropy_gain¶

Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.

Returns:: the gain
Return type:: float

property sf_mean_prior_entropy¶

Returns the entropy per instance for the null model.

Returns:: the entropy
Return type:: float

property sf_mean_scheme_entropy¶

Returns the entropy per instance for the scheme.

Returns:: the entropy
Return type:: float

property sf_prior_entropy¶

Returns the total entropy for the null model.

Returns:: the entropy
Return type:: float

property sf_scheme_entropy¶

Returns the total entropy for the scheme.

Returns:: the entropy
Return type:: float

property size_of_predicted_regions¶

Returns the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.

:return:the size of the regions :rtype: float

summary(title=None, complexity=False)¶

Generates a summary.

Parameters:

title (str) – optional title
complexity (bool) – whether to print the complexity information as well

Returns:

the summary

Return type:

str

test_model(classifier, data, output=None)¶

Evaluates the built model using the specified test data and returns the classifications.

Parameters:

classifier (Classifier) – the trained classifier to evaluate
data (Instances) – the data to evaluate on
output (PredictionOutput) – the output generator to use

Returns:

the classifications

Return type:

ndarray

test_model_once(classifier, inst, store=False)¶

Evaluates the built model using the specified test instance and returns the classification.

Parameters:

classifier (Classifier) – the classifier to cross-validate
inst (Instance) – the Instance to evaluate on
store (bool) – whether to store the predictions (some statistics in class_details() like AUC require that)

Returns:

the classification

Return type:

float

property total_cost¶

Returns the total cost.

Returns:: the cost
Return type:: float

true_negative_rate(class_index)¶

Returns the true negative rate.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the rate
Return type:: float

true_positive_rate(class_index)¶

Returns the true positive rate.

Parameters:: class_index (int) – the 0-based index of the class label
Returns:: the rate
Return type:: float

property unclassified¶

Returns the unclassified count.

Returns:: the count
Return type:: float

property unweighted_macro_f_measure¶

Returns the unweighted macro-averaged F-measure.

Returns:: the measure
Return type:: float

property unweighted_micro_f_measure¶

Returns the unweighted micro-averaged F-measure.

Returns:: the measure
Return type:: float

property weighted_area_under_prc¶

Returns the weighted area under precision recall curve.

Returns:: the weighted area
Return type:: float

property weighted_area_under_roc¶

Returns the weighted area under receiver operator characteristic curve.

Returns:: the weighted area
Return type:: float

property weighted_f_measure¶

Returns the weighted f measure.

Returns:: the measure
Return type:: float

property weighted_false_negative_rate¶

Returns the weighted false negative rate.

Returns:: the rate
Return type:: float

property weighted_false_positive_rate¶

Returns the weighted false positive rate.

Returns:: the rate
Return type:: float

property weighted_matthews_correlation¶

Returns the weighted Matthews correlation (nominal classes).

Returns:: the correlation
Return type:: float

property weighted_precision¶

Returns the weighted precision.

Returns:: the precision
Return type:: float

property weighted_recall¶

Returns the weighted recall.

Returns:: the recall
Return type:: float

property weighted_true_negative_rate¶

Returns the weighted true negative rate.

Returns:: the rate
Return type:: float

property weighted_true_positive_rate¶

Returns the weighted true positive rate.

Returns:: the rate
Return type:: float

class weka.classifiers.FilteredClassifier(jobject=None, options=None)¶

Bases: SingleClassifierEnhancer

Wrapper class for the filtered classifier.

check_for_modified_class_attribute(check)¶

Sets whether to check for class attribute modifications.

Parameters:: check (bool) – True if checking for modifications

property filter¶

Returns the filter.

Returns:: the filter in use
Return type:: weka.filters.Filter

class weka.classifiers.GridSearch(jobject=None, options=None)¶

Bases: SingleClassifierEnhancer

Wrapper class for the GridSearch meta-classifier.

property best¶

Returns the best classifier setup found during the th search.

Returns:: the best classifier setup
Return type:: Classifier

property evaluation¶

Returns the currently set statistic used for evaluation.

Returns:: the statistic
Return type:: SelectedTag

property x¶

Returns a dictionary with all the current values for the X of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:: the dictionary with the parameters
Return type:: dict

property y¶

Returns a dictionary with all the current values for the Y of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:: the dictionary with the parameters
Return type:: dict

class weka.classifiers.Kernel(classname=None, jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for kernels.

build_kernel(data)¶

Builds the classifier with the data.

Parameters:: data (Instances) – the data to train the classifier with

capabilities()¶

Returns the capabilities of the classifier.

Returns:: the capabilities
Return type:: Capabilities

property checks_turned_off¶

Returns whether checks are turned off.

Returns:: True if checks turned off
Return type:: bool

clean()¶: Frees the memory used by the kernel.

eval(id1, id2, inst1)¶

Computes the result of the kernel function for two instances. If id1 == -1, eval use inst1 instead of an instance in the dataset.

Parameters:

id1 (int) – the index of the first instance in the dataset
id2 (int) – the index of the second instance in the dataset
inst1 (Instance) – the instance corresponding to id1 (used if id1 == -1)

classmethod make_copy(kernel)¶

Creates a copy of the kernel.

Parameters:: kernel (Kernel) – the kernel to copy
Returns:: the copy of the kernel
Return type:: Kernel

class weka.classifiers.KernelClassifier(classname=None, jobject=None, options=None)¶

Bases: Classifier

Wrapper class for classifiers that have a kernel property, like SMO.

property kernel¶

Returns the current kernel.

Returns:: the kernel or None if none found
Return type:: Kernel

class weka.classifiers.MultiSearch(jobject=None, options=None)¶

Bases: SingleClassifierEnhancer

Wrapper class for the MultiSearch meta-classifier. NB: ‘multi-search-weka-package’ must be installed (https://github.com/fracpete/multisearch-weka-package), version 2016.1.15 or later.

property best¶

Returns the best classifier setup found during the th search.

Returns:: the best classifier setup
Return type:: Classifier

property evaluation¶

Returns the currently set statistic used for evaluation.

Returns:: the statistic
Return type:: SelectedTag

property parameters¶

Returns the list of currently set search parameters.

Returns:: the list of AbstractSearchParameter objects
Return type:: list

class weka.classifiers.MultipleClassifiersCombiner(classname=None, jobject=None, options=None)¶

Bases: Classifier

Wrapper class for classifiers that use a multiple base classifiers.

append(classifier)¶

Appends the classifier to the current list of classifiers.

Parameters:: classifier (Classifier) – the classifier to add

property classifiers¶

Returns the list of base classifiers.

Returns:: the classifier list
Return type:: list

clear()¶: Removes all classifiers.

class weka.classifiers.NominalPrediction(jobject)¶

Bases: Prediction

Wrapper class for a nominal prediction.

property distribution¶

Returns the class distribution.

Returns:: the class distribution list
Return type:: ndarray

property margin¶

Returns the margin.

Returns:: the margin
Return type:: float

class weka.classifiers.NumericPrediction(jobject)¶

Bases: Prediction

Wrapper class for a numeric prediction.

property error¶

Returns the error.

Returns:: the error
Return type:: float

property prediction_intervals¶

Returns the prediction intervals.

Returns:: the intervals
Return type:: ndarray

class weka.classifiers.Prediction(jobject)¶

Bases: JavaObject

Wrapper class for a prediction.

property actual¶

Returns the actual value.

Returns:: the actual value (internal representation)
Return type:: float

property predicted¶

Returns the predicted value.

Returns:: the predicted value (internal representation)
Return type:: float

property weight¶

Returns the weight.

Returns:: the weight of the Instance that was used
Return type:: float

class weka.classifiers.PredictionOutput(classname='weka.classifiers.evaluation.output.prediction.PlainText', jobject=None, options=None)¶

Bases: OptionHandler

For collecting predictions and generating output from. Must be derived from weka.classifiers.evaluation.output.prediction.AbstractOutput

buffer_content()¶

Returns the content of the buffer as string.

Returns:: The buffer content
Return type:: str

property header¶

Returns the header format.

Returns:: The dataset format
Return type:: Instances

print_all(cls, data)¶

Prints the header, classifications and footer to the buffer.

Parameters:

cls (Classifier) – the classifier
data (Instances) – the test data

print_classification(cls, inst, index)¶

Prints the classification to the buffer.

Parameters:

cls (Classifier) – the classifier
inst (Instance) – the test instance
index (int) – the 0-based index of the test instance

print_classifications(cls, data)¶

Prints the classifications to the buffer.

Parameters:

cls (Classifier) – the classifier
data (Instances) – the test data

print_footer()¶: Prints the footer to the buffer.

print_header()¶: Prints the header to the buffer.

class weka.classifiers.SingleClassifierEnhancer(classname=None, jobject=None, options=None)¶

Bases: Classifier

Wrapper class for classifiers that use a single base classifier.

property classifier¶

Returns the base classifier.

;return: the base classifier :rtype: Classifier

weka.classifiers.main(args=None)¶

Runs a classifier from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:: args (list) – the command-line arguments to use, uses sys.argv if None

weka.classifiers.predictions_to_instances(data, preds)¶

Turns the predictions turned into an Instances object.

Parameters:

data (Instances) – the original dataset format
preds (list) – the predictions to convert

Returns:

the predictions, None if no predictions present

Return type:

Instances

weka.classifiers.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:: 0 for success, 1 for failure.
Return type:: int

weka.clusterers module¶

class weka.clusterers.ClusterEvaluation¶

Bases: JavaObject

Evaluation class for clusterers.

property classes_to_clusters¶

Return the array (ordered by cluster number) of minimum error class to cluster mappings.

Returns:: the mappings
Return type:: ndarray

property cluster_assignments¶

Return an array of cluster assignments corresponding to the most recent set of instances clustered.

Returns:: the cluster assignments
Return type:: ndarray

property cluster_results¶

The cluster results as string.

Returns:: the results string
Return type:: str

classmethod crossvalidate_model(clusterer, data, num_folds, rnd)¶

Cross-validates the clusterer and returns the loglikelihood.

Parameters:

clusterer (Clusterer) – the clusterer instance to evaluate
data (Instances) – the data to evaluate on
num_folds (int) – the number of folds
rnd (Random) – the random number generator to use

Returns:

the cross-validated loglikelihood

Return type:

float

classmethod evaluate_clusterer(clusterer, args)¶

Evaluates the clusterer with the given options.

Parameters:

clusterer (Clusterer) – the clusterer instance to evaluate
args (list) – the command-line arguments

Returns:

the evaluation result

Return type:

str

property log_likelihood¶

Returns the log likelihood.

Returns:: the log likelihood
Return type:: float

property num_clusters¶

Returns the number of clusters.

Returns:: the number of clusters
Return type:: int

set_model(clusterer)¶

Sets the built clusterer to evaluate.

Parameters:: clusterer (Clusterer) – the clusterer to evaluate

test_model(test)¶

Evaluates the currently set clusterer on the test set.

Parameters:: test (Instances) – the test set to use for evaluating

class weka.clusterers.Clusterer(classname='weka.clusterers.SimpleKMeans', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for clusterers.

build_clusterer(data)¶

Builds the clusterer with the data.

Parameters:: data (Instances) – the data to use for training the clusterer

property capabilities¶

Returns the capabilities of the clusterer.

Returns:: the capabilities
Return type:: Capabilities

cluster_instance(inst)¶

Peforms a prediction.

Parameters:: inst (Instance) – the instance to determine the cluster for
Returns:: the clustering result
Return type:: float

classmethod deserialize(ser_file)¶

Deserializes a clusterer from a file.

Parameters:: ser_file (str) – the model file to deserialize
Returns:: model and, if available, the dataset header
Return type:: tuple

distribution_for_instance(inst)¶

Peforms a prediction, returning the cluster distribution.

Parameters:: inst (Instance) – the Instance to get the cluster distribution for
Returns:: the cluster distribution
Return type:: np.ndarray

property graph¶

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:: the graph or None if not available
Return type:: str

property graph_type¶

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:: the type
Return type:: int

property header¶

Returns the header of the training data.

Returns:: the structure of the training data, None if not available
Return type:: Instances

classmethod make_copy(clusterer)¶

Creates a copy of the clusterer.

Parameters:: clusterer (Clusterer) – the clustererto copy
Returns:: the copy of the clusterer
Return type:: Clusterer

property number_of_clusters¶

Returns the number of clusters found.

Returns:: the number fo clusters
Return type:: int

serialize(ser_file, header=None)¶

Serializes the clusterer to the specified file.

Parameters:

ser_file (str) – the file to save the model to
header (Instances) – the (optional) dataset header to store alongside; recommended

update_clusterer(inst)¶

Updates the clusterer with the instance.

Parameters:: inst (Instance) – the Instance to update the clusterer with

update_finished()¶: Signals the clusterer that updating with new data has finished.

class weka.clusterers.FilteredClusterer(jobject=None, options=None)¶

Bases: SingleClustererEnhancer

Wrapper class for the filtered clusterer.

property filter¶

Returns the filter.

Returns:: the filter
Return type:: weka.filters.Filter

class weka.clusterers.SingleClustererEnhancer(classname=None, jobject=None, options=None)¶

Bases: Clusterer

Wrapper class for clusterers that use a single base clusterer.

property clusterer¶

Returns the base clusterer.

Returns:: the clusterer
Return type:: Clusterer

weka.clusterers.avg_silhouette_coefficient(clusterer, dist_func, data)¶

Computes the average silhouette coefficient for a clusterer. Based on Eibe Frank’s Groovy code: https://weka.8497.n7.nabble.com/Silhouette-Measures-and-Dunn-Index-DI-in-Weka-td44072.html

Parameters:

clusterer (Clusterer) – the trained clusterer model to evaluate
dist_func (DistanceFunction) – the distance function to use; if Euclidean, make sure that normalization is turned off
data (Instances) – the standardized data

Returns:

the average silhouette coefficient

Return type:

float

weka.clusterers.main(args=None)¶

Runs a clusterer from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:: args (list) – the command-line arguments to use, uses sys.argv if None

weka.clusterers.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:: 0 for success, 1 for failure.
Return type:: int

weka.datagenerators module¶

class weka.datagenerators.DataGenerator(classname='weka.datagenerators.classifiers.classification.Agrawal', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for datagenerators.

property dataset_format¶

Returns the dataset format.

Returns:: the format
Return type:: Instances

define_data_format()¶

Returns the data format.

Returns:: the data format
Return type:: Instances

generate_example()¶

Returns a single Instance.

Returns:: the next example
Return type:: Instance

generate_examples()¶

Returns complete dataset.

Returns:: the generated dataset
Return type:: Instances

generate_finish()¶

Returns a “finish” string.

Returns:: a finish comment
Return type:: str

generate_start()¶

Returns a “start” string.

Returns:: the start comment
Return type:: str

classmethod make_copy(generator)¶

Creates a copy of the generator.

Parameters:: generator (DataGenerator) – the generator to copy
Returns:: the copy of the generator
Return type:: DataGenerator

classmethod make_data(generator, args)¶

Generates data using the generator and commandline arguments.

Parameters:

generator (DataGenerator) – the generator instance to use
args (list) – the command-line arguments

property num_examples_act¶

Returns a actual number of examples to generate.

Returns:: the number of examples
Return type:: int

property single_mode_flag¶

Returns whether data is generated row by row (True) or in one go (False).

Returns:: whether incremental
Return type:: bool

weka.datagenerators.main(args=None)¶

Runs a datagenerator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:: args (list) – the command-line arguments to use, uses sys.argv if None

weka.datagenerators.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:: 0 for success, 1 for failure.
Return type:: int

weka.experiments module¶

class weka.experiments.Experiment(classname='weka.experiment.Experiment', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for an experiment.

class weka.experiments.ResultMatrix(classname='weka.experiment.ResultMatrixPlainText', jobject=None, options=None)¶

Bases: OptionHandler

For generating results from an Experiment run.

average(col)¶

Returns the average mean at this location (if valid location).

Parameters:: col (int) – the 0-based column index
Returns:: the mean
Return type:: float

property columns¶

Returns the column count.

Returns:: the count
Return type:: int

get_col_name(index)¶

Returns the column name.

Parameters:: index (int) – the 0-based row index
Returns:: the column name, None if invalid index
Return type:: str

get_mean(col, row)¶

Returns the mean at this location (if valid location).

Parameters:

col (int) – the 0-based column index
row (int) – the 0-based row index

Returns:

the mean

Return type:

float

get_row_name(index)¶

Returns the row name.

Parameters:: index (int) – the 0-based row index
Returns:: the row name, None if invalid index
Return type:: str

get_stdev(col, row)¶

Returns the standard deviation at this location (if valid location).

Parameters:

col (int) – the 0-based column index
row (int) – the 0-based row index

Returns:

the standard deviation

Return type:

float

hide_col(index)¶

Hides the column.

Parameters:: index (int) – the 0-based column index

hide_row(index)¶

Hides the row.

Parameters:: index (int) – the 0-based row index

is_col_hidden(index)¶

Returns whether the column is hidden.

Parameters:: index (int) – the 0-based column index
Returns:: true if hidden
Return type:: bool

is_row_hidden(index)¶

Returns whether the row is hidden.

Parameters:: index (int) – the 0-based row index
Returns:: true if hidden
Return type:: bool

property rows¶

Returns the row count.

Returns:: the count
Return type:: int

set_col_name(index, name)¶

Sets the column name.

Parameters:

index (int) – the 0-based row index
name (str) – the name of the column

set_mean(col, row, mean)¶

Sets the mean at this location (if valid location).

Parameters:

col (int) – the 0-based column index
row (int) – the 0-based row index
mean (float) – the mean to set

set_row_name(index, name)¶

Sets the row name.

Parameters:

index (int) – the 0-based row index
name (str) – the name of the row

set_stdev(col, row, stdev)¶

Sets the standard deviation at this location (if valid location).

Parameters:

col (int) – the 0-based column index
row (int) – the 0-based row index
stdev (float) – the standard deviation to set

show_col(index)¶

Shows the column.

Parameters:: index (int) – the 0-based column index

show_row(index)¶

Shows the row.

Parameters:: index (int) – the 0-based row index

to_string_header()¶

Returns the header of the matrix as a string.

Returns:: the header
Return type:: str

to_string_key()¶

Returns a key for all the col names, for better readability if the names got cut off.

Returns:: the key
Return type:: str

to_string_matrix()¶

Returns the matrix as a string.

Returns:: the generated output
Return type:: str

to_string_ranking()¶

Returns the ranking in a string representation.

Returns:: the ranking
Return type:: str

to_string_summary()¶

returns the summary as string.

Returns:: the summary
Return type:: str

class weka.experiments.SimpleCrossValidationExperiment(datasets, classifiers, classification=True, runs=10, folds=10, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)¶

Bases: SimpleExperiment

Performs a simple cross-validation experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()¶

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:: producer and property path
Return type:: tuple

class weka.experiments.SimpleExperiment(datasets, classifiers, jobject=None, classification=True, runs=10, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)¶

Bases: OptionHandler

Ancestor for simple experiments.

See following URL for how to use the Experiment API: http://weka.wikispaces.com/Using+the+Experiment+API

configure_resultproducer()¶

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:: producer and property path
Return type:: tuple

configure_splitevaluator()¶

Configures and returns the SplitEvaluator and Classifier instance as tuple.

Returns:: evaluator and classifier
Return type:: tuple

experiment()¶

Returns the internal experiment, if set up, otherwise None.

Returns:: the internal experiment
Return type:: Experiment

classmethod load(filename)¶

Loads the experiment from disk.

Parameters:: filename (str) – the filename of the experiment to load
Returns:: the experiment
Return type:: Experiment

run()¶: Executes the experiment.

classmethod save(filename, experiment)¶

Saves the experiment to disk.

Parameters:

filename (str) – the filename to save the experiment to
experiment (Experiment) – the Experiment to save

setup()¶: Initializes the experiment.

class weka.experiments.SimpleRandomSplitExperiment(datasets, classifiers, classification=True, runs=10, percentage=66.6, preserve_order=False, result=None, class_for_ir_statistics=0, attribute_id=-1, pred_target_column=False)¶

Bases: SimpleExperiment

Performs a simple random split experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()¶

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:: producer and property path
Return type:: tuple

class weka.experiments.Tester(classname='weka.experiment.PairedCorrectedTTester', jobject=None, options=None, swap_rows_and_cols=False)¶

Bases: OptionHandler

For generating statistical results from an experiment.

property dataset_columns¶

Returns the list of column names that identify uniquely a dataset.

Returns:: the list of attributes names
Return type:: list

property fold_column¶

Returns the column name that holds the Fold number.

Returns:: the attribute name
Return type:: str

header(comparison_column)¶

Creates a “header” string describing the current resultsets.

Parameters:: comparison_column (int) – the index of the column to compare against
Returns:: the header
Return type:: str

init_columns()¶: Sets the column indices based on the supplied names if necessary.

property instances¶

Returns the data used in the analysis.

Returns:: the data in use
Return type:: Instances

multi_resultset_full(base_resultset, comparison_column)¶

Creates a comparison table where a base resultset is compared to the other resultsets.

Parameters:

base_resultset (int) – the 0-based index of the base resultset (eg classifier to compare against)
comparison_column (int) – the 0-based index of the column to compare against

Returns:

the comparison

Return type:

str

multi_resultset_ranking(comparison_column)¶

Creates a ranking.

Parameters:: comparison_column (int) – the 0-based index of the column to compare against
Returns:: the ranking
Return type:: str

multi_resultset_summary(comparison_column)¶

Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.

Parameters:: comparison_column (int) – the 0-based index of the column to compare against
Returns:: the summary
Return type:: str

property result_columns¶

Returns the list of column names that identify uniquely a result (eg classifier + options + ID).

Returns:: the list of attribute names
Return type:: list

property resultmatrix¶

Returns the ResultMatrix instance in use.

Returns:: the matrix in use
Return type:: ResultMatrix

property run_column¶

Returns the column name that holds the Run number.

Returns:: the attribute name
Return type:: str

property swap_rows_and_cols¶

Returns whether to swap rows/cols.

Returns:: whether to swap
Return type:: bool

weka.filters module¶

class weka.filters.AttributeSelection(jobject=None, options=None)¶

Bases: Filter

Wrapper class for weka.filters.supervised.attribute.AttributeSelection.

property evaluator¶

Returns the evaluator.

Returns:: the evaluator in use
Return type:: ASEvaluation

property search¶

Returns the search.

Returns:: the search in use
Return type:: ASSearch

class weka.filters.Filter(classname='weka.filters.AllFilter', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for filters.

batch_finished()¶

Signals the filter that the batch of data has finished.

Returns:: True if instances can be collected from the output
Return type:: bool

capabilities()¶

Returns the capabilities of the filter.

Returns:: the capabilities
Return type:: Capabilities

classmethod deserialize(ser_file)¶

Deserializes a filter from a file.

Parameters:: ser_file (str) – the file to deserialize from
Returns:: model
Return type:: Filter

filter(data)¶

Filters the dataset(s). When providing a list, this can be used to create compatible train/test sets, since the filter only gets initialized with the first dataset and all subsequent datasets get transformed using the same setup.

NB: inputformat(Instances) must have been called beforehand.

Parameters:: data (Instances or list of Instances) – the Instances to filter
Returns:: the filtered Instances object(s)
Return type:: Instances or list of Instances

input(inst)¶

Inputs the Instance.

Parameters:: inst (Instance) – the instance to filter
Returns:: True if filtered can be collected from output
Return type:: bool

inputformat(data)¶

Sets the input format.

Parameters:: data (Instances) – the data to use as input

classmethod make_copy(flter)¶

Creates a copy of the filter.

Parameters:: flter (Filter) – the filter to copy
Returns:: the copy of the filter
Return type:: Filter

output()¶

Outputs the filtered Instance.

Returns:: the filtered instance
Return type:: an Instance object

outputformat()¶

Returns the output format.

Returns:: the output format
Return type:: Instances

serialize(ser_file)¶

Serializes the filter to the specified file.

Parameters:: ser_file (str) – the file to save the filter to

to_source(classname, data)¶

Returns the model as Java source code if the classifier implements weka.filters.Sourcable.

Parameters:

classname (str) – the classname for the generated Java code
data (Instances) – the dataset used for initializing the filter

Returns:

the model as source code string

Return type:

str

class weka.filters.MultiFilter(jobject=None, options=None)¶

Bases: Filter

Wrapper class for weka.filters.MultiFilter.

append(filter)¶

Appends the filter to the current list of filters.

Parameters:: filter (Filter) – the filter to add

clear()¶: Removes all filters.

property filters¶

Returns the list of base filters.

Returns:: the filter list
Return type:: list

class weka.filters.StringToWordVector(jobject=None, options=None)¶

Bases: Filter

Wrapper class for weka.filters.unsupervised.attribute.StringToWordVector.

property stemmer¶

Returns the stemmer.

Returns:: the stemmer
Return type:: Stemmer

property stopwords¶

Returns the stopwords handler.

Returns:: the stopwords handler
Return type:: Stopwords

property tokenizer¶

Returns the tokenizer.

Returns:: the tokenizer
Return type:: Tokenizer

weka.filters.main(args=None)¶

Runs a filter from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:: args (list) – the command-line arguments to use, uses sys.argv if None

weka.filters.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:: 0 for success, 1 for failure.
Return type:: int

weka.timeseries module¶

class weka.timeseries.ConfidenceIntervalForecaster(jobject)¶

Bases: JavaObject

Wrapper class for ConfidenceIntervalForecaster objects.

property calculate_conf_intervals_for_forecasts¶

Returns the number of steps for which confidence intervals will be computed.

Returns:: the steps
Return type:: int

property confidence_level¶

Returns the confidence level in use for computing confidence intervals.

Returns:: the level
Return type:: float

property is_producing_confidence_intervals¶

Returns true if this forecaster is computing confidence limits for some or all of its future forecasts (i.e. getCalculateConfIntervalsForForecasts() > 0).

Returns:: true if confidence intervals are produced
Return type:: bool

class weka.timeseries.CustomPeriodicTest(jobject=None, test=None)¶

Bases: JavaObject

Class that evaluates a supplied date against user-specified date constant fields. Fields that can be tested against include year, month, week of year, week of month, day of year, day of month, day of week, hour of day, minute of hour and second. Wildcard “*” matches any value for a particular field. Each CustomPeriodicTest is made up of one or two test parts. If the first test part’s operator is “=”, then no second part is necessary. Otherwise the first test part may use > or >= operators and the second test part < or <= operators. Taken together, the two parts define an interval. An optional label may be associated with the interval.

evaluate(date)¶

Evaluate the supplied date with respect to this custom periodic test interval.

Parameters:: date (Date) – the date to test
Returns:: true if the date lies within the interval.
Return type:: bool

property label¶

Returns the label.

Returns:: the label
Return type:: str

lower_test()¶

Returns the lower bound test.

Returns:: the test
Return type:: TestPart

test(test)¶

Sets the test as string.

Parameters:: test (str) – the test to use

upper_test()¶

Returns the upper bound test.

Returns:: the test
Return type:: TestPart

class weka.timeseries.ErrorModule(jobject)¶

Bases: TSEvalModule

Wrapper for ErrorModule objects.

counts_for_targets()¶

Returns the number of predicted, actual pairs for each target. Only entries that are non-missing for both actual and predicted contribute to the overall count.

Returns:: the number of predicted, actual pairs for each target.
Return type:: ndarray

errors_for_target(target)¶

Returns the list of the errors for the supplied target.

Parameters:: target (str) – the target to get the errors for
Returns:: the errors
Return type:: list

predictions_for_all_targets()¶

Returns the list of predictions for all targets.

Returns:: list of list of NumericPrediction
Return type:: list

predictions_for_target(target)¶

Returns the list of predictions for the target.

Parameters:: target (str) – the target to get the predictions for
Returns:: list of NumericPrediction
Return type:: list

class weka.timeseries.IncrementallyPrimeable(jobject)¶

Bases: JavaObject

Wrapper class for IncrementallyPrimeable objects.

prime_forecaster_incremental(inst)¶

Primes the forecaster using the provided data.

Parameters:: inst (Instance) – the instance to prime with

class weka.timeseries.OverlayForecaster(jobject)¶

Bases: JavaObject

Wrapper class for OverlayForecaster objects.

forecast_with_overlays(steps, overlays)¶

Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated. Also assumes that the forecaster has been told which attributes are to be considered “overlay” attributes in the data. Overlay data is data that the forecaster will be provided with when making a forecast into the future - i.e. it will be given the values of these attributes for future instances. The overlay data provided to this method should have the same structure as the original data used to train the forecaster - i.e. all original fields should be present, including the targets and time stamp field (if supplied). The values of targets will of course be missing (‘?’) since we want to forecast those. The time stamp values (if a time stamp is in use) may be provided, in which case the forecaster will use the time stamp values in the overlay instances. If the time stamp values are missing, then date arithmetic (for date time stamps) will be used to advance the time value beyond the last seen training value; similarly, for artificial time stamps or non-date time stamps, the computed time delta will be used to increment beyond the last seen training value.

The number of instances in the overlay data should typically match the number of steps that have been requested for forecasting. If these differ, then overlay.numInstances() will be the number of steps forecasted.

Parameters:

steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.
overlays (Instances) – the overlay data to use

Returns:

a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)

Return type:

list

property is_using_overlay_data¶

Returns true if overlay data has been used to train this forecaster, and thus is expected to be supplied for future time steps when making a forecast.

Returns:

property overlay_fields¶

Returns the overlay fields as string.

Returns:: the overlay fields
Return type:: str

class weka.timeseries.Periodicity(jobject=None, periodicity=None)¶

Bases: Enum

Defines periodicity.

class weka.timeseries.PeriodicityHandler(jobject)¶

Bases: JavaObject

Helper class to manage time stamp manipulation with respect to various periodicities. Has a routine to remap the time stamp, which is useful for date time stamps. Since dates are just manipulated internally as the number of milliseconds elapsed since the epoch, and any global trend modelling in regression functions results in enormous coefficients for this variable - remapping to a more reasonable scale prevents this. It also makes it easier to handle the case where there are time periods that shouldn’t be considered as a time unit increment, e.g. weekends and public holidays for financial trading data. These “holes” in the data can be accomodated by accumulating a negative offset for the remapped date when a particular data/time occurs in a user-specified “skip” list.

property delta_time¶

Returns the delta time.

Returns:: the delta time
Return type:: float

class weka.timeseries.TSEvalModule(jobject)¶

Bases: JavaObject

Wrapper for TSEvalModule objects.

calculate_measure()¶

Calculate the measure that this module represents.

Returns:: the value of the measure for this module for each of the target(s).
Return type:: ndarray

property definition¶: Returns the description.

property description¶: Returns the description.

property eval_name¶: Returns the name.

evaluate_for_instance(pred, inst)¶

Evaluate the given forecast(s) with respect to the given test instance. Targets with missing values are ignored.

Parameters:

pred (NumericPrediction) – the numeric prediction
inst (Instance) – the instance

classmethod module(name)¶

Returns the module with the specified name.

Parameters:: name (str) – the name of the module to return
Returns:: the TSEvalModule object
Return type:: TSEvalModule

classmethod module_list()¶

Returns list of available modules.

Returns:: the list of modules (TSEvalModule objects)
Return type:: list

reset()¶: Resets the module.

property summary¶: Returns the description.

property target_fields¶

Returns the list of target fields.

Returns:: the list of target fields
Return type:: list

class weka.timeseries.TSEvaluation(train, test_split_size=0.3, test=None)¶

Bases: JavaObject

Evaluation class for timeseries forecasters.

evaluate(forecaster, build_model=True)¶

Evaluates the forecaster.

Parameters:

forecaster (TSForecaster) – the forecaster to evaluate
build_model (bool) – whether to build the model as well

classmethod evaluate_forecaster(forecaster, args)¶

Evaluates the forecaster with the given options.

Parameters:

forecaster (TSForecaster) – the forecaster instance to use
args (list) – the command-line arguments to use

property evaluate_on_test_data¶

Returns whether to evaluate on the test data.

Returns:: whether to evaluate
Return type:: bool

property evaluate_on_training_data¶

Returns whether to evaluate on the training data.

Returns:: whether to evaluate
Return type:: bool

property evaluation_modules¶

Returns the list of evaluation modules in use.

Returns:: list of TSEvalModule object
Return type:: list

property forecast_future¶

Returns whether we should generate a future forecast beyond the end of the training and/or test data.

Returns:: whether to prime
Return type:: bool

property horizon¶

Returns the number of steps to predict into the future.

Returns:: the number of steps
Return type:: int

predictions_for_test_data(step_number)¶

Predictions for all targets for the specified step number on the test data.

Parameters:: step_number (int) – number of the step into the future to return predictions for

predictions_for_training_data(step_number)¶

Predictions for all targets for the specified step number on the training data.

Parameters:: step_number (int) – number of the step into the future to return predictions for

property prime_for_test_data_with_test_data¶

Returns whether evaluation for test data should begin by priming with the first x test data instances and then forecasting from step x + 1. This is the only option if there is no training data and a model has been deserialized from disk. If we have training data, and it occurs immediately before the test data in time, then we can prime with the last x instances from the training data.

Returns:: whether to prime
Return type:: bool

property prime_window_size¶

Returns the size of the priming window, ie the number of historical instances to present before making a forecast.

Returns:: the size
Return type:: int

print_future_forecast_on_test_data(forecaster)¶

Print the forecasted values (for all targets) beyond the end of the test data.

Parameters:: forecaster (TSForecaster) – the forecaster to use
Returns:: the forecasted values
Return type:: str

print_future_forecast_on_training_data(forecaster)¶

Print the forecasted values (for all targets) beyond the end of the training data.

Parameters:: forecaster (TSForecaster) – the forecaster to use
Returns:: the forecasted values
Return type:: str

print_predictions_for_test_data(title, target_name, step_ahead, instance_number_offset=0)¶

Print the predictions for a given target at a given step-ahead level on the test data.

Parameters:

title (str) – the title for the output
target_name (str) – the name of the target to print predictions for
step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions
instance_number_offset (int) – the offset from the start of the test data from which to print actual and predicted values

Returns:

the predicted/actual values

Return type:

str

print_predictions_for_training_data(title, target_name, step_ahead, instance_number_offset=0)¶

Print the predictions for a given target at a given step-ahead level on the training data.

Parameters:

title (str) – the title for the output
target_name (str) – the name of the target to print predictions for
step_ahead (int) – the step-ahead level - e.g. 3 would print the 3-step-ahead predictions
instance_number_offset (int) – the offset from the start of the training data from which to print actual and predicted values

Returns:

the predicted/actual values

Return type:

str

property rebuild_model_after_each_test_forecast_step¶

Returns whether the forecasting model should be rebuilt after each forecasting step on the test data using both the training data and test data up to the current instance.

Returns:: whether to rebuild
Return type:: bool

summary()¶

Generates a summary.

Returns:: the summary
Return type:: str

property test_data¶

Returns the test data.

Returns:: the test data, None if none available
Return type:: Instances

property training_data¶

Returns the training data.

Returns:: the training data, None if none available
Return type:: Instances

class weka.timeseries.TSForecaster(classname='weka.classifiers.timeseries.WekaForecaster', jobject=None, options=None)¶

Bases: OptionHandler

Wrapper class for timeseries forecasters.

property algorithm_name¶

Returns the name of the algorithm.

Returns:: the name
Return type:: str

property base_model_has_serializer¶

Check whether the base learner requires special serialization.

Returns:: True if base learner requires special serialization, false otherwise
Return type:: bool

build_forecaster(data)¶

Builds the forecaster using the provided data.

Parameters:: data (Instances) – the data to train with

clear_previous_state()¶: Reset model state.

property fields_to_forecast¶

Returns the fields to forecast.

Returns:: the fields
Return type:: str

forecast(steps)¶

Produce a forecast for the target field(s). Assumes that the model has been built and/or primed so that a forecast can be generated.

Parameters:: steps (int) – number of forecasted values to produce for each target. E.g. a value of 5 would produce a prediction for t+1, t+2, …, t+5.
Returns:: a List of Lists (one for each step) of forecasted values for each target (NumericPrediction objects)
Return type:: list

property header¶

Returns the header of the training data.

Returns:: the structure of the training data, None if not available
Return type:: Instances

load_base_model(fname)¶

Loads the base model from the given filename.

Parameters:: fname (str) – the file to load the base model from

load_serialized_state(fname)¶

Loads the serialized state from the given filename.

Parameters:: fname (str) – the file to deserialize the state from

property previous_state¶

Returns the previous state.

Returns:: the state as list of JPype object objects
Return type:: list

prime_forecaster(data)¶

Primes the forecaster using the provided data.

Parameters:: data (Instances) – the data to prime with

reset()¶: Resets the algorithm.

run_forecaster(forecaster, options)¶: Builds the forecaster using the provided data.

save_base_model(fname)¶

Saves the base model under the given filename.

Parameters:: fname (str) – the file to save the base model under

serialize_state(fname)¶

Serializes the state under the given filename.

Parameters:: fname (str) – the file to serialize the state under

property uses_state¶

Check whether the base learner requires operations regarding state.

Returns:: True if base learner uses state-based predictions, false otherwise
Return type:: bool

class weka.timeseries.TSLagMaker(jobject=None, options=None)¶

Bases: Filter

A class for creating lagged versions of target variable(s) for use in time series forecasting. Uses the TimeseriesTranslate filter. Has options for creating averages of consecutive lagged variables (which can be useful for long lagged variables). Some polynomials of time are also created (if there is a time stamp), such as time^2 and time^3. Also creates cross products between time and the lagged and averaged lagged variables. If there is no date time stamp in the data then the user has the option of having an artificial time stamp created. Time stamps, real or otherwise, are used for modeling trends rather than using a differencing-based approach.

Also has routines for dealing with a date timestamp - i.e. it can detect a monthly time period (because months are different lengths) and maps date time stamps to equal spaced time intervals. For example, in general, a date time stamp is remapped by subtracting the first observed value and adding this value divided by the constant delta (difference between consecutive steps) to the result. In the case of a detected monthly time period, the remapping involves subtracting the base year and then adding to this the number of the month within the current year plus twelve times the number of intervening years since the base year.

Also has routines for adding new attributes derived from a date time stamp to the data - e.g. AM indicator, day of the week, month, quarter etc. In the case where there is no real date time stamp, the user may specify a nominal periodic variable (if one exists in the data). For example, month might be coded as a nominal value. In this case it can be specified as the primary periodic variable. The point is, that in all these cases (nominal periodic and date-derived periodics), we are able to determine what the value of these variables will be in future instances (as computed from the last known historic instance).

property add_am_indicator¶

Returns whether to add an AM indicator.

Returns:: true if to add
Return type:: bool

add_custom_periodic(periodic)¶

Adds the custom periodic.

Parameters:: periodic (str) – the periodic to add

property add_day_of_month¶

Returns whether to add day of month attribute.

Returns:: true if to add
Return type:: bool

property add_day_of_week¶

Returns whether to add day of week attribute.

Returns:: true if to add
Return type:: bool

property add_month_of_year¶

Returns whether to add month of year attribute.

Returns:: true if to add
Return type:: bool

property add_num_days_in_month¶

Returns whether to add # of days in month attribute.

Returns:: true if to add
Return type:: bool

property add_quarter_of_year¶

Returns whether to add quarter of year attribute.

Returns:: true if to add
Return type:: bool

property add_weekend_indicator¶

Returns whether to add a weekend indicator.

Returns:: true if to add
Return type:: bool

property adjust_for_trends¶

Returns true if we are adjusting for trends via a real or artificial time stamp.

Returns:: true if to adjust
Return type:: bool

property adjust_for_variance¶

Returns true if we are adjusting for variance by taking the log of the target(s).

Returns:: true if to adjust
Return type:: bool

property artificial_time_start_value¶

Returns the current value of the artificial time stamp. After training, after priming, and prior to forecasting, this will be equal to the number of training instances seen.

Returns:: the start
Return type:: float

property average_consecutive_long_lags¶

Returns true if consecutive long lagged variables are to be averaged.

Returns:: true if to average
Return type:: bool

property average_lags_after¶

Returns the point after which long lagged variables will be averaged.

Returns:: the lag
Return type:: int

clear_custom_periodics()¶: Clears the custom periodics.

clear_lag_histories()¶: Clears any history accumulated in the lag creating filters.

create_time_lag_cross_products(data)¶

Creates the cross-products.

Parameters:: data (Instances) – the data to create the cross-products for
Returns:: the cross-products
Return type:: Instances

property current_timestamp_value¶

Returns the current (i.e. most recent) time stamp value. Unlike an artificial time stamp, the value after training, after priming and before forecasting, will be equal to the time stamp of the most recent priming instance.

Returns:: the timestamp value
Return type:: float

property delta_time¶

Returns the difference between time values. This may be only approximate for periods based on dates. It is best to used date-based arithmetic in this case for incrementing/decrementing time stamps.

Returns:: the delta
Return type:: float

property fields_to_lag¶

Returns the fields to lag as list.

Returns:: the fields to lag
Return type:: list

property fields_to_lag_as_string¶

Returns the fields to lag as string.

Returns:: the fields to lag
Return type:: str

property include_powers_of_time¶

Returns whether to include powers of time in the transformed data.

Returns:: true if to include
Return type:: bool

property include_timelag_products¶

Returns whether to include products between time and the lagged variables.

Returns:: true if to include
Return type:: bool

increment_artificial_time_value(increment)¶

Increment the artificial time value with the supplied increment value.

Parameters:: increment (int) – the increment

property is_using_artificial_time_index¶

Returns whether an artifical time index is used.

Returns:: true if to add
Return type:: bool

property lag_range¶

Returns the lag range to create.

Returns:: the lag range
Return type:: str

property max_lag¶

Returns the maximum lag to create.

Returns:: the lag
Return type:: int

property min_lag¶

Returns the minimum lag to create.

Returns:: the lag
Return type:: int

property num_consecutive_long_lags_to_average¶

Returns the number of consecutive long lagged variables to average.

Returns:: the lag
Return type:: int

property overlay_fields¶

Returns the overlay fields as list.

Returns:: the overlay fields
Return type:: list

property periodicity¶

Returns the Periodicity representing the time stamp in use for this lag maker. If the lag maker is not adjusting for trends, or an artificial time stamp is being used, then null is returned.

Returns:: the periodicity
Return type:: Periodicity

property primary_periodic_field_name¶

Returns the name of the primary periodic attribute or null if one hasn’t been specified.

Returns:: the name
Return type:: str

property remove_leading_instances_with_unknown_lag_values¶

Returns whether to remove instances with unknown lag values.

Returns:: true if to remove
Return type:: bool

property skip_entries¶

Returns a list of time units to be ‘skipped’ - i.e. not considered as an increment. E.g financial markets don’t trade on the weekend, so the difference between friday closing and the following monday closing is one time unit (and not three). Can accept strings such as “sat”, “sunday”, “jan”, “august”, or explicit dates (with optional formatting string) such as “2011-07-04@yyyy-MM-dd”, or integers. Integers are interpreted with respect to the periodicity - e.g for daily data they are interpreted as day of the year; for hourly data, hour of the day; weekly data, week of the year.

Returns:: the lag range
Return type:: str

property timestamp_field¶

Returns the overlay fields as list.

Returns:: the overlay fields
Return type:: list

transformed_data(data)¶

Returns the transformed data.

Parameters:: data (Instances) – the data to transform
Returns:: the transformed data
Return type:: Instances

class weka.timeseries.TSLagUser(jobject)¶

Bases: JavaObject

Wrapper class for TSLagUser objects.

property tslag_maker¶

Returns the base forecaster.

Returns:: the base forecaster
Return type:: Classifier

class weka.timeseries.TestPart(jobject)¶

Bases: JavaObject

Inner class defining one boundary of an interval.

day()¶

Returns the day string.

Returns:: the day string
Return type:: str

day_of_month(s)¶

Sets the day of the month.

Parameters:: s (str) – the dom to use

day_of_week(s)¶

Sets the day of the week.

Parameters:: s (str) – the dow to use

day_of_year(s)¶

Sets the day of year.

Parameters:: s (str) – the doy to use

eval(date, other)¶

Evaluate the supplied date against this bound. Handles date fields that are cyclic (such as month, day of week etc.) so that intervals such as oct < date < mar evaluate correctly.

Parameters:

date (Date) – the date to test
other (TestPart) – the other bound

Returns:

true if the supplied date is within this bound

Return type:

bool

hour_of_day(s)¶

Sets the hour of the day.

Parameters:: s (str) – the hod to use

property is_upper¶

Returns true if this is the upper bound.

Returns:: true if upper bound
Return type:: bool

minute_of_hour(s)¶

Sets the minute of the hour.

Parameters:: s (str) – the moh to use

property month¶

Returns the month string.

Returns:: the month string
Return type:: str

operator(s)¶

Sets the operator.

Parameters:: s (str) – the operator to use

second(s)¶

Sets the second.

Parameters:: s (str) – the second to use

week_of_month(s)¶

Sets the week of the month.

Parameters:: s (str) – the wom to use

week_of_year(s)¶

Sets the week of the year.

Parameters:: s (str) – the woy to use

year(s)¶

Sets the year.

Parameters:: s (str) – the year to use

class weka.timeseries.WekaForecaster(jobject=None, options=None)¶

Bases: TSForecaster, TSLagUser, ConfidenceIntervalForecaster, OverlayForecaster, IncrementallyPrimeable

Wrapper class for Weka timeseries forecasters.

add_custom_periodic(periodic)¶

Adds the custom periodic.

Parameters:: periodic (str) – the periodic to add

property base_forecaster¶

Returns the base forecaster.

Returns:: the base forecaster
Return type:: Classifier

clear_custom_periodics()¶: Clears the custom periodics.

weka package¶

Subpackages¶

weka.associations module¶

weka.attribute_selection module¶

weka.classifiers module¶

weka.clusterers module¶

weka.datagenerators module¶

weka.experiments module¶

weka.filters module¶

weka.timeseries module¶

python-weka-wrapper3

Navigation

Related Topics