weka package

Submodules

weka.associations module

class weka.associations.AssociationRule(jobject)

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.AssociationRule class.

consequence

Get the the consequence.

Returns:the consequence, list of Item objects
Return type:list
consequence_support

Get the support for the consequence.

Returns:the support
Return type:int
metric_names

Returns the metric names for the rule.

Returns:the metric names
Return type:list
metric_value(name)

Returns the named metric value for the rule.

Parameters:name (str) – the name of the metric
Returns:the metric value
Return type:float
metric_values

Returns the metric values for the rule.

Returns:the metric values
Return type:ndarray
premise

Get the the premise.

Returns:the premise, list of Item objects
Return type:list
premise_support

Get the support for the premise.

Returns:the support
Return type:int
primary_metric_name

Returns the primary metric name for the rule.

Returns:the metric name
Return type:str
primary_metric_value

Returns the primary metric value for the rule.

Returns:the metric value
Return type:float
total_support

Get the total support.

Returns:the support
Return type:int
total_transactions

Get the total transactions.

Returns:the transactions
Return type:int
class weka.associations.AssociationRules(jobject)

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.AssociationRules class.

producer

Returns a string describing the producer that generated these rules.

Returns:the producer
Return type:str
class weka.associations.AssociationRulesIterator(rules)

Bases: object

Iterator for weka.associations.AssociationRules class.

next()

Returns the next rule.

Returns:the next rule object
Return type:AssociationRule
class weka.associations.Associator(classname=None, jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for associators.

association_rules()

Returns association rules that were generated. Only if implements AssociationRulesProducer.

Returns:the association rules that were generated
Return type:AssociationRules
build_associations(data)

Builds the associator with the data.

Parameters:data (Instances) – the data to train the associator with
can_produce_rules()

Checks whether association rules can be generated.

Returns:whether scheme implements AssociationRulesProducer interface and

association rules can be generated :rtype: bool

capabilities

Returns the capabilities of the associator.

Returns:the capabilities
Return type:Capabilities
classmethod make_copy(associator)

Creates a copy of the clusterer.

Parameters:associator (Associator) – the associator to copy
Returns:the copy of the associator
Return type:Associator
rule_metric_names

Returns the rule metric names of the association rules. Only if implements AssociationRulesProducer.

Returns:the metric names
Return type:list
class weka.associations.Item(jobject)

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.Item class.

attribute

Returns the attribute.

Returns:the attribute
Return type:Attribute
comparison

Returns the comparison operator as string.

Returns:the comparison iterator
Return type:str
decrease_frequency(frequency=None)

Decreases the frequency.

Parameters:frequency (int) – the frequency to decrease by, 1 if None
frequency

Returns the frequency.

Returns:the frequency
Return type:int
increase_frequency(frequency=None)

Increases the frequency.

Parameters:frequency (int) – the frequency to increase by, 1 if None
item_value

Returns the item value as string.

Returns:the item value
Return type:str
weka.associations.main(args=None)

Runs a associator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:args (list) – the command-line arguments to use, uses sys.argv if None
weka.associations.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:0 for success, 1 for failure.
Return type:int

weka.attribute_selection module

class weka.attribute_selection.ASEvaluation(classname='weka.attributeSelection.CfsSubsetEval', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for attribute selection evaluation algorithm.

build_evaluator(data)

Builds the evaluator with the data.

Parameters:data (Instances) – the data to use
capabilities

Returns the capabilities of the classifier.

Returns:the capabilities
Return type:Capabilities
post_process(indices)

Post-processes the evaluator with the selected attribute indices.

Parameters:indices (ndarray) – the attribute indices list to use
Returns:the processed indices
Return type:ndarray
class weka.attribute_selection.ASSearch(classname='weka.attributeSelection.BestFirst', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for attribute selection search algorithm.

search(evaluation, data)

Performs the search and returns the indices of the selected attributes.

Parameters:
Returns:

the selected attributes (0-based indices)

Return type:

ndarray

class weka.attribute_selection.AttributeSelection

Bases: weka.core.classes.JavaObject

Performs attribute selection using search and evaluation algorithms.

classmethod attribute_selection(evaluator, args)

Performs attribute selection using the given attribute evaluator and options.

Parameters:
  • evaluator (ASEvaluation) – the evaluator to use
  • args (list) – the command-line args for the attribute selection
Returns:

the results string

Return type:

str

crossvalidation(crossvalidation)

Sets whether to perform cross-validation.

Parameters:crossvalidation (bool) – whether to perform cross-validation
cv_results

Generates a results string from the last cross-validation attribute selection.

Returns:the results string
Return type:str
evaluator(evaluator)

Sets the evaluator to use.

Parameters:evaluator (ASEvaluation) – the evaluator to use.
folds(folds)

Sets the number of folds to use for cross-validation.

Parameters:folds (int) – the number of folds
number_attributes_selected

Returns the number of attributes that were selected.

Returns:the number of attributes
Return type:int
ranked_attributes

Returns the matrix of ranked attributes from the last run.

Returns:the Numpy matrix
Return type:ndarray
ranking(ranking)

Sets whether to perform a ranking, if possible.

Parameters:ranking (bool) – whether to perform a ranking
reduce_dimensionality(data)

Reduces the dimensionality of the provided Instance or Instances object.

Parameters:data (Instances) – the data to process
Returns:the reduced dataset
Return type:Instances
results_string

Generates a results string from the last attribute selection.

Returns:the results string
Return type:str
search(search)

Sets the search algorithm to use.

Parameters:search (ASSearch) – the search algorithm
seed(seed)

Sets the seed for cross-validation.

Parameters:seed (int) – the seed value
select_attributes(instances)

Performs attribute selection on the given dataset.

Parameters:instances (Instances) – the data to process
select_attributes_cv_split(instances)

Performs attribute selection on the given cross-validation split.

Parameters:instances (Instances) – the data to process
selected_attributes

Returns the selected attributes from the last run.

Returns:the Numpy array of 0-based indices
Return type:ndarray
weka.attribute_selection.main(args=None)

Runs attribute selection from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:args (list) – the command-line arguments to use, uses sys.argv if None
weka.attribute_selection.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:0 for success, 1 for failure.
Return type:int

weka.classifiers module

class weka.classifiers.Classifier(classname='weka.classifiers.rules.ZeroR', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for classifiers.

batch_size

Returns the batch size, in case this classifier is a batch predictor.

Returns:the batch size, None if not a batch predictor
Return type:str
build_classifier(data)

Builds the classifier with the data.

Parameters:data (Instances) – the data to train the classifier with
capabilities

Returns the capabilities of the classifier.

Returns:the capabilities
Return type:Capabilities
classify_instance(inst)

Peforms a prediction.

Parameters:inst (Instance) – the Instance to get a prediction for
Returns:the classification (either regression value or 0-based label index)
Return type:float
classmethod deserialize(ser_file)

Deserializes a classifier from a file.

Parameters:ser_file (str) – the model file to deserialize
Returns:model and, if available, the dataset header
Return type:tuple
distribution_for_instance(inst)

Peforms a prediction, returning the class distribution.

Parameters:inst (Instance) – the Instance to get the class distribution for
Returns:the class distribution array
Return type:ndarray
distributions_for_instances(data)

Peforms predictions, returning the class distributions.

Parameters:data (Instances) – the Instances to get the class distributions for
Returns:the class distribution matrix, None if not a batch predictor
Return type:ndarray
graph

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:the generated graph string
Return type:str
graph_type

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:the type
Return type:int
has_efficient_batch_prediction()

Returns whether the classifier implements a more efficient batch prediction.

Returns:True if a more efficient batch prediction is implemented, always False if not batch predictor
Return type:bool
classmethod make_copy(classifier)

Creates a copy of the classifier.

Parameters:classifier (Classifier) – the classifier to copy
Returns:the copy of the classifier
Return type:Classifier
serialize(ser_file, header=None)

Serializes the classifier to the specified file.

Parameters:
  • ser_file (str) – the file to save the model to
  • header (Instances) – the (optional) dataset header to store alongside; recommended
to_source(classname)

Returns the model as Java source code if the classifier implements weka.classifiers.Sourcable.

Parameters:classname (str) – the classname for the generated Java code
Returns:the model as source code string
Return type:str
update_classifier(inst)

Updates the classifier with the instance.

Parameters:inst (Instance) – the Instance to update the classifier with
class weka.classifiers.CostMatrix(matrx=None, num_classes=None)

Bases: weka.core.classes.JavaObject

Class for storing and manipulating a misclassification cost matrix. The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i. Cost values can be fixed or computed on a per-instance basis (cost sensitive evaluation only) from the value of an attribute or an expression involving attribute(s).

apply_cost_matrix(data, rnd)

Applies the cost matrix to the data.

Parameters:
  • data (Instances) – the data to apply to
  • rnd (Random) – the random number generator
expected_costs(class_probs, inst=None)

Calculates the expected misclassification cost for each possible class value, given class probability estimates.

Parameters:class_probs (ndarray) – the class probabilities
Returns:the calculated costs
Return type:ndarray
get_cell(row, col)

Returns the JB_Object at the specified location.

Parameters:
  • row (int) – the 0-based index of the row
  • col (int) – the 0-based index of the column
Returns:

the object in that cell

Return type:

JB_Object

get_element(row, col, inst=None)

Returns the value at the specified location.

Parameters:
  • row (int) – the 0-based index of the row
  • col (int) – the 0-based index of the column
  • inst (Instance) – the Instace
Returns:

the value in that cell

Return type:

float

get_max_cost(class_value, inst=None)

Gets the maximum cost for a particular class value.

Parameters:
  • class_value (int) – the class value to get the maximum cost for
  • inst (Instance) – the Instance
Returns:

the cost

Return type:

float

initialize()

Initializes the matrix.

normalize()

Normalizes the matrix.

num_columns

Returns the number of columns.

Returns:the number of columns
Return type:int
num_rows

Returns the number of rows.

Returns:the number of rows
Return type:int
classmethod parse_matlab(matlab)

Parses the costmatrix definition in matlab format and returns a matrix.

Parameters:matlab (str) – the matlab matrix string, eg [1 2; 3 4].
Returns:the generated matrix
Return type:CostMatrix
set_cell(row, col, obj)

Sets the JB_Object at the specified location. Automatically unwraps JavaObject.

Parameters:
  • row (int) – the 0-based index of the row
  • col (int) – the 0-based index of the column
  • obj (object) – the object for that cell
set_element(row, col, value)

Sets the float value at the specified location.

Parameters:
  • row (int) – the 0-based index of the row
  • col (int) – the 0-based index of the column
  • value (float) – the float value for that cell
size

Returns the number of rows/columns.

Returns:the number of rows/columns
Return type:int
to_matlab()

Returns the matrix in Matlab format.

Returns:the matrix as Matlab formatted string
Return type:str
class weka.classifiers.Evaluation(data, cost_matrix=None)

Bases: weka.core.classes.JavaObject

Evaluation class for classifiers.

area_under_prc(class_index)

Returns the area under precision recall curve.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the area
Return type:float
area_under_roc(class_index)

Returns the area under receiver operators characteristics curve.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the area
Return type:float
avg_cost

Returns the average cost.

Returns:the cost
Return type:float
class_details(title=None)

Generates the class details.

Parameters:title (str) – optional title
Returns:the details
Return type:str
class_priors

Returns the class priors.

Returns:the priors
Return type:ndarray
confusion_matrix

Returns the confusion matrix.

Returns:the matrix
Return type:ndarray
correct

Returns the correct count (nominal classes).

Returns:the count
Return type:float
correlation_coefficient

Returns the correlation coefficient (numeric classes).

Returns:the coefficient
Return type:float
coverage_of_test_cases_by_predicted_regions

Returns the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.

Returns:the coverage
Return type:float
crossvalidate_model(classifier, data, num_folds, rnd, output=None)

Crossvalidates the model using the specified data, number of folds and random number generator wrapper.

Parameters:
  • classifier (Classifier) – the classifier to cross-validate
  • data (Instances) – the data to evaluate on
  • num_folds (int) – the number of folds
  • rnd (Random) – the random number generator to use
  • output (PredictionOutput) – the output generator to use
cumulative_margin_distribution()

Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.

Returns:the cumulative margin distribution
Return type:str
discard_predictions

Returns whether to discard predictions (saves memory).

Returns:True if to discard
Return type:bool
error_rate

Returns the error rate (numeric classes).

Returns:the rate
Return type:float
classmethod evaluate_model(classifier, args)

Evaluates the classifier with the given options.

Parameters:
  • classifier (Classifier) – the classifier instance to use
  • args (list) – the command-line arguments to use
Returns:

the evaluation string

Return type:

str

evaluate_train_test_split(classifier, data, percentage, rnd=None, output=None)

Splits the data into train and test, builds the classifier with the training data and evaluates it against the test set.

Parameters:
  • classifier (Classifier) – the classifier to cross-validate
  • data (Instances) – the data to evaluate on
  • percentage (double) – the percentage split to use (amount to use for training)
  • rnd (Random) – the random number generator to use, if None the order gets preserved
  • output (PredictionOutput) – the output generator to use
f_measure(class_index)

Returns the f measure.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the measure
Return type:float
false_negative_rate(class_index)

Returns the false negative rate.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the rate
Return type:float
false_positive_rate(class_index)

Returns the false positive rate.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the rate
Return type:float
header

Returns the header format.

Returns:the header format
Return type:Instances
incorrect

Returns the incorrect count (nominal classes).

Returns:the count
Return type:float
kappa

Returns kappa.

Returns:kappa
Return type:float
kb_information

Returns KB information.

Returns:the information
Return type:float
kb_mean_information

Returns KB mean information.

Returns:the information
Return type:float
kb_relative_information

Returns KB relative information.

Returns:the information
Return type:float
matrix(title=None)

Generates the confusion matrix.

Parameters:title (str) – optional title
Returns:the matrix
Return type:str
matthews_correlation_coefficient(class_index)

Returns the Matthews correlation coefficient (nominal classes).

Parameters:class_index (int) – the 0-based index of the class label
Returns:the coefficient
Return type:float
mean_absolute_error

Returns the mean absolute error.

Returns:the error
Return type:float
mean_prior_absolute_error

Returns the mean prior absolute error.

Returns:the error
Return type:float
num_false_negatives(class_index)

Returns the number of false negatives.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the count
Return type:float
num_false_positives(class_index)

Returns the number of false positives.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the count
Return type:float
num_instances

Returns the number of instances that had a known class value.

Returns:the number of instances
Return type:float
num_true_negatives(class_index)

Returns the number of true negatives.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the count
Return type:float
num_true_positives(class_index)

Returns the number of true positives.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the count
Return type:float
percent_correct

Returns the percent correct (nominal classes).

Returns:the percentage
Return type:float
percent_incorrect

Returns the percent incorrect (nominal classes).

Returns:the percentage
Return type:float
percent_unclassified

Returns the percent unclassified.

Returns:the percentage
Return type:float
precision(class_index)

Returns the precision.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the precision
Return type:float
predictions

Returns the predictions.

Returns:the predictions. None if not available
Return type:list
recall(class_index)

Returns the recall.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the recall
Return type:float
relative_absolute_error

Returns the relative absolute error.

Returns:the error
Return type:float
root_mean_prior_squared_error

Returns the root mean prior squared error.

Returns:the error
Return type:float
root_mean_squared_error

Returns the root mean squared error.

Returns:the error
Return type:float
root_relative_squared_error

Returns the root relative squared error.

Returns:the error
Return type:float
sf_entropy_gain

Returns the total SF, which is the null model entropy minus the scheme entropy.

Returns:the gain
Return type:float
sf_mean_entropy_gain

Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.

Returns:the gain
Return type:float
sf_mean_prior_entropy

Returns the entropy per instance for the null model.

Returns:the entropy
Return type:float
sf_mean_scheme_entropy

Returns the entropy per instance for the scheme.

Returns:the entropy
Return type:float
sf_prior_entropy

Returns the total entropy for the null model.

Returns:the entropy
Return type:float
sf_scheme_entropy

Returns the total entropy for the scheme.

Returns:the entropy
Return type:float
size_of_predicted_regions

Returns the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.

:return:the size of the regions :rtype: float

summary(title=None, complexity=False)

Generates a summary.

Parameters:
  • title (str) – optional title
  • complexity (bool) – whether to print the complexity information as well
Returns:

the summary

Return type:

str

test_model(classifier, data, output=None)

Evaluates the built model using the specified test data and returns the classifications.

Parameters:
Returns:

the classifications

Return type:

ndarray

test_model_once(classifier, inst)

Evaluates the built model using the specified test instance and returns the classification.

Parameters:
  • classifier (Classifier) – the classifier to cross-validate
  • inst (Instances) – the Instance to evaluate on
Returns:

the classification

Return type:

float

total_cost

Returns the total cost.

Returns:the cost
Return type:float
true_negative_rate(class_index)

Returns the true negative rate.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the rate
Return type:float
true_positive_rate(class_index)

Returns the true positive rate.

Parameters:class_index (int) – the 0-based index of the class label
Returns:the rate
Return type:float
unclassified

Returns the unclassified count.

Returns:the count
Return type:float
unweighted_macro_f_measure

Returns the unweighted macro-averaged F-measure.

Returns:the measure
Return type:float
unweighted_micro_f_measure

Returns the unweighted micro-averaged F-measure.

Returns:the measure
Return type:float
weighted_area_under_prc

Returns the weighted area under precision recall curve.

Returns:the weighted area
Return type:float
weighted_area_under_roc

Returns the weighted area under receiver operator characteristic curve.

Returns:the weighted area
Return type:float
weighted_f_measure

Returns the weighted f measure.

Returns:the measure
Return type:float
weighted_false_negative_rate

Returns the weighted false negative rate.

Returns:the rate
Return type:float
weighted_false_positive_rate

Returns the weighted false positive rate.

Returns:the rate
Return type:float
weighted_matthews_correlation

Returns the weighted Matthews correlation (nominal classes).

Returns:the correlation
Return type:float
weighted_precision

Returns the weighted precision.

Returns:the precision
Return type:float
weighted_recall

Returns the weighted recall.

Returns:the recall
Return type:float
weighted_true_negative_rate

Returns the weighted true negative rate.

Returns:the rate
Return type:float
weighted_true_positive_rate

Returns the weighted true positive rate.

Returns:the rate
Return type:float
class weka.classifiers.FilteredClassifier(jobject=None, options=None)

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the filtered classifier.

check_for_modified_class_attribute(check)

Sets whether to check for class attribute modifications.

Parameters:check (bool) – True if checking for modifications
filter

Returns the filter.

Returns:the filter in use
Return type:Filter
class weka.classifiers.GridSearch(jobject=None, options=None)

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the GridSearch meta-classifier.

best

Returns the best classifier setup found during the th search.

Returns:the best classifier setup
Return type:Classifier
evaluation

Returns the currently set statistic used for evaluation.

Returns:the statistic
Return type:SelectedTag
x

Returns a dictionary with all the current values for the X of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:the dictionary with the parameters
Return type:dict
y

Returns a dictionary with all the current values for the Y of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:the dictionary with the parameters
Return type:dict
class weka.classifiers.Kernel(classname=None, jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for kernels.

build_kernel(data)

Builds the classifier with the data.

Parameters:data (Instances) – the data to train the classifier with
capabilities()

Returns the capabilities of the classifier.

Returns:the capabilities
Return type:Capabilities
checks_turned_off

Returns whether checks are turned off.

Returns:True if checks turned off
Return type:bool
clean()

Frees the memory used by the kernel.

eval(id1, id2, inst1)

Computes the result of the kernel function for two instances. If id1 == -1, eval use inst1 instead of an instance in the dataset.

Parameters:
  • id1 (int) – the index of the first instance in the dataset
  • id2 (int) – the index of the second instance in the dataset
  • inst1 (Instance) – the instance corresponding to id1 (used if id1 == -1)
classmethod make_copy(kernel)

Creates a copy of the kernel.

Parameters:kernel (Kernel) – the kernel to copy
Returns:the copy of the kernel
Return type:Kernel
class weka.classifiers.KernelClassifier(classname=None, jobject=None, options=None)

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that have a kernel property, like SMO.

kernel

Returns the current kernel.

Returns:the kernel or None if none found
Return type:Kernel
class weka.classifiers.MultiSearch(jobject=None, options=None)

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the MultiSearch meta-classifier. NB: ‘multi-search-weka-package’ must be installed (https://github.com/fracpete/multisearch-weka-package), version 2016.1.15 or later.

best

Returns the best classifier setup found during the th search.

Returns:the best classifier setup
Return type:Classifier
evaluation

Returns the currently set statistic used for evaluation.

Returns:the statistic
Return type:SelectedTag
parameters

Returns the list of currently set search parameters.

Returns:the list of AbstractSearchParameter objects
Return type:list
class weka.classifiers.MultipleClassifiersCombiner(classname=None, jobject=None, options=None)

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that use a multiple base classifiers.

classifiers

Returns the list of base classifiers.

Returns:the classifier list
Return type:list
class weka.classifiers.NominalPrediction(jobject)

Bases: weka.classifiers.Prediction

Wrapper class for a nominal prediction.

distribution

Returns the class distribution.

Returns:the class distribution list
Return type:ndarray
margin

Returns the margin.

Returns:the margin
Return type:float
class weka.classifiers.NumericPrediction(jobject)

Bases: weka.classifiers.Prediction

Wrapper class for a numeric prediction.

error

Returns the error.

Returns:the error
Return type:float
prediction_intervals

Returns the prediction intervals.

Returns:the intervals
Return type:ndarray
class weka.classifiers.Prediction(jobject)

Bases: weka.core.classes.JavaObject

Wrapper class for a prediction.

actual

Returns the actual value.

Returns:the actual value (internal representation)
Return type:float
predicted

Returns the predicted value.

Returns:the predicted value (internal representation)
Return type:float
weight

Returns the weight.

Returns:the weight of the Instance that was used
Return type:float
class weka.classifiers.PredictionOutput(classname='weka.classifiers.evaluation.output.prediction.PlainText', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

For collecting predictions and generating output from. Must be derived from weka.classifiers.evaluation.output.prediction.AbstractOutput

buffer_content()

Returns the content of the buffer as string.

Returns:The buffer content
Return type:str
header

Returns the header format.

Returns:The dataset format
Return type:Instances
print_all(cls, data)

Prints the header, classifications and footer to the buffer.

Parameters:
print_classification(cls, inst, index)

Prints the classification to the buffer.

Parameters:
  • cls (Classifier) – the classifier
  • inst (Instance) – the test instance
  • index (int) – the 0-based index of the test instance
print_classifications(cls, data)

Prints the classifications to the buffer.

Parameters:

Prints the footer to the buffer.

print_header()

Prints the header to the buffer.

class weka.classifiers.SingleClassifierEnhancer(classname=None, jobject=None, options=None)

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that use a single base classifier.

classifier

Returns the base classifier.

;return: the base classifier :rtype: Classifier

weka.classifiers.main(args=None)

Runs a classifier from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:args (list) – the command-line arguments to use, uses sys.argv if None
weka.classifiers.predictions_to_instances(data, preds)

Turns the predictions turned into an Instances object.

Parameters:
  • data (Instances) – the original dataset format
  • preds (list) – the predictions to convert
Returns:

the predictions, None if no predictions present

Return type:

Instances

weka.classifiers.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:0 for success, 1 for failure.
Return type:int

weka.clusterers module

class weka.clusterers.ClusterEvaluation

Bases: weka.core.classes.JavaObject

Evaluation class for clusterers.

classes_to_clusters

Return the array (ordered by cluster number) of minimum error class to cluster mappings.

Returns:the mappings
Return type:ndarray
cluster_assignments

Return an array of cluster assignments corresponding to the most recent set of instances clustered.

Returns:the cluster assignments
Return type:ndarray
cluster_results

The cluster results as string.

Returns:the results string
Return type:str
classmethod crossvalidate_model(clusterer, data, num_folds, rnd)

Cross-validates the clusterer and returns the loglikelihood.

Parameters:
  • clusterer (Clusterer) – the clusterer instance to evaluate
  • data (Instances) – the data to evaluate on
  • num_folds (int) – the number of folds
  • rnd (Random) – the random number generator to use
Returns:

the cross-validated loglikelihood

Return type:

float

classmethod evaluate_clusterer(clusterer, args)

Evaluates the clusterer with the given options.

Parameters:
  • clusterer (Clusterer) – the clusterer instance to evaluate
  • args (list) – the command-line arguments
Returns:

the evaluation result

Return type:

str

log_likelihood

Returns the log likelihood.

Returns:the log likelihood
Return type:float
num_clusters

Returns the number of clusters.

Returns:the number of clusters
Return type:int
set_model(clusterer)

Sets the built clusterer to evaluate.

Parameters:clusterer (Clusterer) – the clusterer to evaluate
test_model(test)

Evaluates the currently set clusterer on the test set.

Parameters:test (Instances) – the test set to use for evaluating
class weka.clusterers.Clusterer(classname='weka.clusterers.SimpleKMeans', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for clusterers.

build_clusterer(data)

Builds the clusterer with the data.

Parameters:data (Instances) – the data to use for training the clusterer
capabilities

Returns the capabilities of the clusterer.

Returns:the capabilities
Return type:Capabilities
cluster_instance(inst)

Peforms a prediction.

Parameters:inst (Instance) – the instance to determine the cluster for
Returns:the clustering result
Return type:float
classmethod deserialize(ser_file)

Deserializes a clusterer from a file.

Parameters:ser_file (str) – the model file to deserialize
Returns:model and, if available, the dataset header
Return type:tuple
distribution_for_instance(inst)

Peforms a prediction, returning the cluster distribution.

Parameters:inst (Instance) – the Instance to get the cluster distribution for
Returns:the cluster distribution
Return type:float[]
graph

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:the graph or None if not available
Return type:str
graph_type

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:the type
Return type:int
classmethod make_copy(clusterer)

Creates a copy of the clusterer.

Parameters:clusterer (Clusterer) – the clustererto copy
Returns:the copy of the clusterer
Return type:Clusterer
number_of_clusters

Returns the number of clusters found.

Returns:the number fo clusters
Return type:int
serialize(ser_file, header=None)

Serializes the clusterer to the specified file.

Parameters:
  • ser_file (str) – the file to save the model to
  • header (Instances) – the (optional) dataset header to store alongside; recommended
update_clusterer(inst)

Updates the clusterer with the instance.

Parameters:inst (Instance) – the Instance to update the clusterer with
update_finished()

Signals the clusterer that updating with new data has finished.

class weka.clusterers.FilteredClusterer(jobject=None, options=None)

Bases: weka.clusterers.SingleClustererEnhancer

Wrapper class for the filtered clusterer.

filter

Returns the filter.

Returns:the filter
Return type:Filter
class weka.clusterers.SingleClustererEnhancer(classname=None, jobject=None, options=None)

Bases: weka.clusterers.Clusterer

Wrapper class for clusterers that use a single base clusterer.

clusterer

Returns the base clusterer.

Returns:the clusterer
Return type:Clusterer
weka.clusterers.main(args=None)

Runs a clusterer from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:args (list) – the command-line arguments to use, uses sys.argv if None
weka.clusterers.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:0 for success, 1 for failure.
Return type:int

weka.datagenerators module

class weka.datagenerators.DataGenerator(classname='weka.datagenerators.classifiers.classification.Agrawal', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for datagenerators.

dataset_format

Returns the dataset format.

Returns:the format
Return type:Instances
define_data_format()

Returns the data format.

Returns:the data format
Return type:Instances
generate_example()

Returns a single Instance.

Returns:the next example
Return type:Instance
generate_examples()

Returns complete dataset.

Returns:the generated dataset
Return type:Instances
generate_finish()

Returns a “finish” string.

Returns:a finish comment
Return type:str
generate_start()

Returns a “start” string.

Returns:the start comment
Return type:str
classmethod make_copy(generator)

Creates a copy of the generator.

Parameters:generator (DataGenerator) – the generator to copy
Returns:the copy of the generator
Return type:DataGenerator
classmethod make_data(generator, args)

Generates data using the generator and commandline arguments.

Parameters:
  • generator (DataGenerator) – the generator instance to use
  • args (list) – the command-line arguments
num_examples_act

Returns a actual number of examples to generate.

Returns:the number of examples
Return type:int
single_mode_flag

Returns whether data is generated row by row (True) or in one go (False).

Returns:whether incremental
Return type:bool
weka.datagenerators.main(args=None)

Runs a datagenerator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:args (list) – the command-line arguments to use, uses sys.argv if None
weka.datagenerators.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:0 for success, 1 for failure.
Return type:int

weka.experiments module

class weka.experiments.Experiment(classname='weka.experiment.Experiment', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for an experiment.

class weka.experiments.ResultMatrix(classname='weka.experiment.ResultMatrixPlainText', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

For generating results from an Experiment run.

average(col)

Returns the average mean at this location (if valid location).

Parameters:col (int) – the 0-based column index
Returns:the mean
Return type:float
columns

Returns the column count.

Returns:the count
Return type:int
get_col_name(index)

Returns the column name.

Parameters:index (int) – the 0-based row index
Returns:the column name, None if invalid index
Return type:str
get_mean(col, row)

Returns the mean at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index
  • row (int) – the 0-based row index
Returns:

the mean

Return type:

float

get_row_name(index)

Returns the row name.

Parameters:index (int) – the 0-based row index
Returns:the row name, None if invalid index
Return type:str
get_stdev(col, row)

Returns the standard deviation at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index
  • row (int) – the 0-based row index
Returns:

the standard deviation

Return type:

float

hide_col(index)

Hides the column.

Parameters:index (int) – the 0-based column index
hide_row(index)

Hides the row.

Parameters:index (int) – the 0-based row index
is_col_hidden(index)

Returns whether the column is hidden.

Parameters:index (int) – the 0-based column index
Returns:true if hidden
Return type:bool
is_row_hidden(index)

Returns whether the row is hidden.

Parameters:index (int) – the 0-based row index
Returns:true if hidden
Return type:bool
rows

Returns the row count.

Returns:the count
Return type:int
set_col_name(index, name)

Sets the column name.

Parameters:
  • index (int) – the 0-based row index
  • name (str) – the name of the column
set_mean(col, row, mean)

Sets the mean at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index
  • row (int) – the 0-based row index
  • mean (float) – the mean to set
set_row_name(index, name)

Sets the row name.

Parameters:
  • index (int) – the 0-based row index
  • name (str) – the name of the row
set_stdev(col, row, stdev)

Sets the standard deviation at this location (if valid location).

Parameters:
  • col (int) – the 0-based column index
  • row (int) – the 0-based row index
  • stdev (float) – the standard deviation to set
show_col(index)

Shows the column.

Parameters:index (int) – the 0-based column index
show_row(index)

Shows the row.

Parameters:index (int) – the 0-based row index
to_string_header()

Returns the header of the matrix as a string.

Returns:the header
Return type:str
to_string_key()

Returns a key for all the col names, for better readability if the names got cut off.

Returns:the key
Return type:str
to_string_matrix()

Returns the matrix as a string.

Returns:the generated output
Return type:str
to_string_ranking()

Returns the ranking in a string representation.

Returns:the ranking
Return type:str
to_string_summary()

returns the summary as string.

Returns:the summary
Return type:str
class weka.experiments.SimpleCrossValidationExperiment(datasets, classifiers, classification=True, runs=10, folds=10, result=None)

Bases: weka.experiments.SimpleExperiment

Performs a simple cross-validation experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:producer and property path
Return type:tuple
class weka.experiments.SimpleExperiment(datasets, classifiers, jobject=None, classification=True, runs=10, result=None)

Bases: weka.core.classes.OptionHandler

Ancestor for simple experiments.

See following URL for how to use the Experiment API: http://weka.wikispaces.com/Using+the+Experiment+API

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:producer and property path
Return type:tuple
configure_splitevaluator()

Configures and returns the SplitEvaluator and Classifier instance as tuple.

Returns:evaluator and classifier
Return type:tuple
experiment()

Returns the internal experiment, if set up, otherwise None.

Returns:the internal experiment
Return type:Experiment
classmethod load(filename)

Loads the experiment from disk.

Parameters:filename (str) – the filename of the experiment to load
Returns:the experiment
Return type:Experiment
run()

Executes the experiment.

classmethod save(filename, experiment)

Saves the experiment to disk.

Parameters:
  • filename (str) – the filename to save the experiment to
  • experiment (Experiment) – the Experiment to save
setup()

Initializes the experiment.

class weka.experiments.SimpleRandomSplitExperiment(datasets, classifiers, classification=True, runs=10, percentage=66.6, preserve_order=False, result=None)

Bases: weka.experiments.SimpleExperiment

Performs a simple random split experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:producer and property path
Return type:tuple
class weka.experiments.Tester(classname='weka.experiment.PairedCorrectedTTester', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

For generating statistical results from an experiment.

dataset_columns

Returns the list of column names that identify uniquely a dataset.

Returns:the list of attributes names
Return type:list
fold_column

Returns the column name that holds the Fold number.

Returns:the attribute name
Return type:str
header(comparison_column)

Creates a “header” string describing the current resultsets.

Parameters:comparison_column (int) – the index of the column to compare against
Returns:the header
Return type:str
init_columns()

Sets the column indices based on the supplied names if necessary.

instances

Returns the data used in the analysis.

Returns:the data in use
Return type:Instances
multi_resultset_full(base_resultset, comparison_column)

Creates a comparison table where a base resultset is compared to the other resultsets.

Parameters:
  • base_resultset (int) – the 0-based index of the base resultset (eg classifier to compare against)
  • comparison_column (int) – the 0-based index of the column to compare against
Returns:

the comparison

Return type:

str

multi_resultset_ranking(comparison_column)

Creates a ranking.

Parameters:comparison_column (int) – the 0-based index of the column to compare against
Returns:the ranking
Return type:str
multi_resultset_summary(comparison_column)

Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.

Parameters:comparison_column (int) – the 0-based index of the column to compare against
Returns:the summary
Return type:str
result_columns

Returns the list of column names that identify uniquely a result (eg classifier + options + ID).

Returns:the list of attribute names
Return type:list
resultmatrix

Returns the ResultMatrix instance in use.

Returns:the matrix in use
Return type:ResultMatrix
run_column

Returns the column name that holds the Run number.

Returns:the attribute name
Return type:str

weka.filters module

class weka.filters.Filter(classname='weka.filters.AllFilter', jobject=None, options=None)

Bases: weka.core.classes.OptionHandler

Wrapper class for filters.

batch_finished()

Signals the filter that the batch of data has finished.

Returns:True if instances can be collected from the output
Return type:bool
capabilities()

Returns the capabilities of the filter.

Returns:the capabilities
Return type:Capabilities
classmethod deserialize(ser_file)

Deserializes a filter from a file.

Parameters:ser_file (str) – the file to deserialize from
Returns:model
Return type:Filter
filter(data)

Filters the dataset(s). When providing a list, this can be used to create compatible train/test sets, since the filter only gets initialized with the first dataset and all subsequent datasets get transformed using the same setup.

NB: inputformat(Instances) must have been called beforehand.

Parameters:data (Instances or list of Instances) – the Instances to filter
Returns:the filtered Instances object(s)
Return type:Instances or list of Instances
input(inst)

Inputs the Instance.

Parameters:inst (Instance) – the instance to filter
Returns:True if filtered can be collected from output
Return type:bool
inputformat(data)

Sets the input format.

Parameters:data (Instances) – the data to use as input
classmethod make_copy(flter)

Creates a copy of the filter.

Parameters:flter (Filter) – the filter to copy
Returns:the copy of the filter
Return type:Filter
output()

Outputs the filtered Instance.

Returns:the filtered instance
Return type:an Instance object
outputformat()

Returns the output format.

Returns:the output format
Return type:Instances
serialize(ser_file)

Serializes the filter to the specified file.

Parameters:ser_file (str) – the file to save the filter to
to_source(classname, data)

Returns the model as Java source code if the classifier implements weka.filters.Sourcable.

Parameters:
  • classname (str) – the classname for the generated Java code
  • data (Instances) – the dataset used for initializing the filter
Returns:

the model as source code string

Return type:

str

class weka.filters.MultiFilter(jobject=None, options=None)

Bases: weka.filters.Filter

Wrapper class for weka.filters.MultiFilter.

filters

Returns the list of base filters.

Returns:the filter list
Return type:list
class weka.filters.StringToWordVector(jobject=None, options=None)

Bases: weka.filters.Filter

Wrapper class for weka.filters.unsupervised.attribute.StringToWordVector.

stemmer

Returns the stemmer.

Returns:the stemmer
Return type:Stemmer
stopwords

Returns the stopwords handler.

Returns:the stopwords handler
Return type:Stopwords
tokenizer

Returns the tokenizer.

Returns:the tokenizer
Return type:Tokenizer
weka.filters.main(args=None)

Runs a filter from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:args (list) – the command-line arguments to use, uses sys.argv if None
weka.filters.sys_main()

Runs the main function using the system cli arguments, and returns a system error code.

Returns:0 for success, 1 for failure.
Return type:int

Module contents