weka package¶

Submodules¶

weka.associations module¶

class weka.associations.AssociationRule(jobject)¶

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.AssociationRule class.

consequence¶

Get the the consequence.

Returns:	the consequence, list of Item objects
Return type:	list

consequence_support¶

Get the support for the consequence.

Returns:	the support
Return type:	int

metric_names¶

Returns the metric names for the rule.

Returns:	the metric names
Return type:	list

metric_value(name)¶

Returns the named metric value for the rule.

Parameters:	name (str) – the name of the metric
Returns:	the metric value
Return type:	float

metric_values¶

Returns the metric values for the rule.

Returns:	the metric values
Return type:	ndarray

premise¶

Get the the premise.

Returns:	the premise, list of Item objects
Return type:	list

premise_support¶

Get the support for the premise.

Returns:	the support
Return type:	int

primary_metric_name¶

Returns the primary metric name for the rule.

Returns:	the metric name
Return type:	str

primary_metric_value¶

Returns the primary metric value for the rule.

Returns:	the metric value
Return type:	float

total_support¶

Get the total support.

Returns:	the support
Return type:	int

total_transactions¶

Get the total transactions.

Returns:	the transactions
Return type:	int

class weka.associations.AssociationRules(jobject)¶

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.AssociationRules class.

producer¶

Returns a string describing the producer that generated these rules.

Returns:	the producer
Return type:	str

class weka.associations.AssociationRulesIterator(rules)¶

Bases: object

Iterator for weka.associations.AssociationRules class.

next()¶

Returns the next rule.

Returns:	the next rule object
Return type:	AssociationRule

class weka.associations.Associator(classname=None, jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for associators.

association_rules()¶

Returns association rules that were generated. Only if implements AssociationRulesProducer.

Returns:	the association rules that were generated
Return type:	AssociationRules

build_associations(data)¶

Builds the associator with the data.

Parameters:	data (Instances) – the data to train the associator with

can_produce_rules()¶

Checks whether association rules can be generated.

Returns:	whether scheme implements AssociationRulesProducer interface and

association rules can be generated :rtype: bool

capabilities¶

Returns the capabilities of the associator.

Returns:	the capabilities
Return type:	Capabilities

classmethod make_copy(associator)¶

Creates a copy of the clusterer.

Parameters:	associator (Associator) – the associator to copy
Returns:	the copy of the associator
Return type:	Associator

rule_metric_names¶

Returns the rule metric names of the association rules. Only if implements AssociationRulesProducer.

Returns:	the metric names
Return type:	list

class weka.associations.Item(jobject)¶

Bases: weka.core.classes.JavaObject

Wrapper for weka.associations.Item class.

attribute¶

Returns the attribute.

Returns:	the attribute
Return type:	Attribute

comparison¶

Returns the comparison operator as string.

Returns:	the comparison iterator
Return type:	str

decrease_frequency(frequency=None)¶

Decreases the frequency.

Parameters:	frequency (int) – the frequency to decrease by, 1 if None

frequency¶

Returns the frequency.

Returns:	the frequency
Return type:	int

increase_frequency(frequency=None)¶

Increases the frequency.

Parameters:	frequency (int) – the frequency to increase by, 1 if None

item_value¶

Returns the item value as string.

Returns:	the item value
Return type:	str

weka.associations.main(args=None)¶

Runs a associator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:	args (list) – the command-line arguments to use, uses sys.argv if None

weka.associations.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:	0 for success, 1 for failure.
Return type:	int

weka.attribute_selection module¶

class weka.attribute_selection.ASEvaluation(classname='weka.attributeSelection.CfsSubsetEval', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for attribute selection evaluation algorithm.

build_evaluator(data)¶

Builds the evaluator with the data.

Parameters:	data (Instances) – the data to use

capabilities¶

Returns the capabilities of the classifier.

Returns:	the capabilities
Return type:	Capabilities

post_process(indices)¶

Post-processes the evaluator with the selected attribute indices.

Parameters:	indices (ndarray) – the attribute indices list to use
Returns:	the processed indices
Return type:	ndarray

class weka.attribute_selection.ASSearch(classname='weka.attributeSelection.BestFirst', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for attribute selection search algorithm.

search(evaluation, data)¶

Performs the search and returns the indices of the selected attributes.

Parameters:	evaluation (ASEvaluation) – the evaluation algorithm to use data (Instances) – the data to use
Returns:	the selected attributes (0-based indices)
Return type:	ndarray

class weka.attribute_selection.AttributeSelection¶

Bases: weka.core.classes.JavaObject

Performs attribute selection using search and evaluation algorithms.

classmethod attribute_selection(evaluator, args)¶

Performs attribute selection using the given attribute evaluator and options.

Parameters:	evaluator (ASEvaluation) – the evaluator to use args (list) – the command-line args for the attribute selection
Returns:	the results string
Return type:	str

crossvalidation(crossvalidation)¶

Sets whether to perform cross-validation.

Parameters:	crossvalidation (bool) – whether to perform cross-validation

cv_results¶

Generates a results string from the last cross-validation attribute selection.

Returns:	the results string
Return type:	str

evaluator(evaluator)¶

Sets the evaluator to use.

Parameters:	evaluator (ASEvaluation) – the evaluator to use.

folds(folds)¶

Sets the number of folds to use for cross-validation.

Parameters:	folds (int) – the number of folds

number_attributes_selected¶

Returns the number of attributes that were selected.

Returns:	the number of attributes
Return type:	int

ranked_attributes¶

Returns the matrix of ranked attributes from the last run.

Returns:	the Numpy matrix
Return type:	ndarray

ranking(ranking)¶

Sets whether to perform a ranking, if possible.

Parameters:	ranking (bool) – whether to perform a ranking

reduce_dimensionality(data)¶

Reduces the dimensionality of the provided Instance or Instances object.

Parameters:	data (Instances) – the data to process
Returns:	the reduced dataset
Return type:	Instances

results_string¶

Generates a results string from the last attribute selection.

Returns:	the results string
Return type:	str

search(search)¶

Sets the search algorithm to use.

Parameters:	search (ASSearch) – the search algorithm

seed(seed)¶

Sets the seed for cross-validation.

Parameters:	seed (int) – the seed value

select_attributes(instances)¶

Performs attribute selection on the given dataset.

Parameters:	instances (Instances) – the data to process

select_attributes_cv_split(instances)¶

Performs attribute selection on the given cross-validation split.

Parameters:	instances (Instances) – the data to process

selected_attributes¶

Returns the selected attributes from the last run.

Returns:	the Numpy array of 0-based indices
Return type:	ndarray

weka.attribute_selection.main(args=None)¶

Runs attribute selection from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:	args (list) – the command-line arguments to use, uses sys.argv if None

weka.attribute_selection.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:	0 for success, 1 for failure.
Return type:	int

weka.classifiers module¶

class weka.classifiers.Classifier(classname='weka.classifiers.rules.ZeroR', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for classifiers.

batch_size¶

Returns the batch size, in case this classifier is a batch predictor.

Returns:	the batch size, None if not a batch predictor
Return type:	str

build_classifier(data)¶

Builds the classifier with the data.

Parameters:	data (Instances) – the data to train the classifier with

capabilities¶

Returns the capabilities of the classifier.

Returns:	the capabilities
Return type:	Capabilities

classify_instance(inst)¶

Peforms a prediction.

Parameters:	inst (Instance) – the Instance to get a prediction for
Returns:	the classification (either regression value or 0-based label index)
Return type:	float

classmethod deserialize(ser_file)¶

Deserializes a classifier from a file.

Parameters:	ser_file (str) – the model file to deserialize
Returns:	model and, if available, the dataset header
Return type:	tuple

distribution_for_instance(inst)¶

Peforms a prediction, returning the class distribution.

Parameters:	inst (Instance) – the Instance to get the class distribution for
Returns:	the class distribution array
Return type:	ndarray

distributions_for_instances(data)¶

Peforms predictions, returning the class distributions.

Parameters:	data (Instances) – the Instances to get the class distributions for
Returns:	the class distribution matrix, None if not a batch predictor
Return type:	ndarray

graph¶

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:	the generated graph string
Return type:	str

graph_type¶

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:	the type
Return type:	int

has_efficient_batch_prediction()¶

Returns whether the classifier implements a more efficient batch prediction.

Returns:	True if a more efficient batch prediction is implemented, always False if not batch predictor
Return type:	bool

classmethod make_copy(classifier)¶

Creates a copy of the classifier.

Parameters:	classifier (Classifier) – the classifier to copy
Returns:	the copy of the classifier
Return type:	Classifier

serialize(ser_file, header=None)¶

Serializes the classifier to the specified file.

Parameters:	ser_file (str) – the file to save the model to header (Instances) – the (optional) dataset header to store alongside; recommended

to_source(classname)¶

Returns the model as Java source code if the classifier implements weka.classifiers.Sourcable.

Parameters:	classname (str) – the classname for the generated Java code
Returns:	the model as source code string
Return type:	str

update_classifier(inst)¶

Updates the classifier with the instance.

Parameters:	inst (Instance) – the Instance to update the classifier with

class weka.classifiers.CostMatrix(matrx=None, num_classes=None)¶

Bases: weka.core.classes.JavaObject

Class for storing and manipulating a misclassification cost matrix. The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i. Cost values can be fixed or computed on a per-instance basis (cost sensitive evaluation only) from the value of an attribute or an expression involving attribute(s).

apply_cost_matrix(data, rnd)¶

Applies the cost matrix to the data.

Parameters:	data (Instances) – the data to apply to rnd (Random) – the random number generator

expected_costs(class_probs, inst=None)¶

Calculates the expected misclassification cost for each possible class value, given class probability estimates.

Parameters:	class_probs (ndarray) – the class probabilities
Returns:	the calculated costs
Return type:	ndarray

get_cell(row, col)¶

Returns the JB_Object at the specified location.

Parameters:	row (int) – the 0-based index of the row col (int) – the 0-based index of the column
Returns:	the object in that cell
Return type:	JB_Object

get_element(row, col, inst=None)¶

Returns the value at the specified location.

Parameters:	row (int) – the 0-based index of the row col (int) – the 0-based index of the column inst (Instance) – the Instace
Returns:	the value in that cell
Return type:	float

get_max_cost(class_value, inst=None)¶

Gets the maximum cost for a particular class value.

Parameters:	class_value (int) – the class value to get the maximum cost for inst (Instance) – the Instance
Returns:	the cost
Return type:	float

initialize()¶: Initializes the matrix.

normalize()¶: Normalizes the matrix.

num_columns¶

Returns the number of columns.

Returns:	the number of columns
Return type:	int

num_rows¶

Returns the number of rows.

Returns:	the number of rows
Return type:	int

classmethod parse_matlab(matlab)¶

Parses the costmatrix definition in matlab format and returns a matrix.

Parameters:	matlab (str) – the matlab matrix string, eg [1 2; 3 4].
Returns:	the generated matrix
Return type:	CostMatrix

set_cell(row, col, obj)¶

Sets the JB_Object at the specified location. Automatically unwraps JavaObject.

Parameters:	row (int) – the 0-based index of the row col (int) – the 0-based index of the column obj (object) – the object for that cell

set_element(row, col, value)¶

Sets the float value at the specified location.

Parameters:	row (int) – the 0-based index of the row col (int) – the 0-based index of the column value (float) – the float value for that cell

size¶

Returns the number of rows/columns.

Returns:	the number of rows/columns
Return type:	int

to_matlab()¶

Returns the matrix in Matlab format.

Returns:	the matrix as Matlab formatted string
Return type:	str

class weka.classifiers.Evaluation(data, cost_matrix=None)¶

Bases: weka.core.classes.JavaObject

Evaluation class for classifiers.

area_under_prc(class_index)¶

Returns the area under precision recall curve.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the area
Return type:	float

area_under_roc(class_index)¶

Returns the area under receiver operators characteristics curve.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the area
Return type:	float

avg_cost¶

Returns the average cost.

Returns:	the cost
Return type:	float

class_details(title=None)¶

Generates the class details.

Parameters:	title (str) – optional title
Returns:	the details
Return type:	str

class_priors¶

Returns the class priors.

Returns:	the priors
Return type:	ndarray

confusion_matrix¶

Returns the confusion matrix.

Returns:	the matrix
Return type:	ndarray

correct¶

Returns the correct count (nominal classes).

Returns:	the count
Return type:	float

correlation_coefficient¶

Returns the correlation coefficient (numeric classes).

Returns:	the coefficient
Return type:	float

coverage_of_test_cases_by_predicted_regions¶

Returns the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.

Returns:	the coverage
Return type:	float

crossvalidate_model(classifier, data, num_folds, rnd, output=None)¶

Crossvalidates the model using the specified data, number of folds and random number generator wrapper.

Parameters:	classifier (Classifier) – the classifier to cross-validate data (Instances) – the data to evaluate on num_folds (int) – the number of folds rnd (Random) – the random number generator to use output (PredictionOutput) – the output generator to use

cumulative_margin_distribution()¶

Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.

Returns:	the cumulative margin distribution
Return type:	str

discard_predictions¶

Returns whether to discard predictions (saves memory).

Returns:	True if to discard
Return type:	bool

error_rate¶

Returns the error rate (numeric classes).

Returns:	the rate
Return type:	float

classmethod evaluate_model(classifier, args)¶

Evaluates the classifier with the given options.

Parameters:	classifier (Classifier) – the classifier instance to use args (list) – the command-line arguments to use
Returns:	the evaluation string
Return type:	str

evaluate_train_test_split(classifier, data, percentage, rnd=None, output=None)¶

Splits the data into train and test, builds the classifier with the training data and evaluates it against the test set.

Parameters:	classifier (Classifier) – the classifier to cross-validate data (Instances) – the data to evaluate on percentage (double) – the percentage split to use (amount to use for training) rnd (Random) – the random number generator to use, if None the order gets preserved output (PredictionOutput) – the output generator to use

f_measure(class_index)¶

Returns the f measure.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the measure
Return type:	float

false_negative_rate(class_index)¶

Returns the false negative rate.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the rate
Return type:	float

false_positive_rate(class_index)¶

Returns the false positive rate.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the rate
Return type:	float

header¶

Returns the header format.

Returns:	the header format
Return type:	Instances

incorrect¶

Returns the incorrect count (nominal classes).

Returns:	the count
Return type:	float

kappa¶

Returns kappa.

Returns:	kappa
Return type:	float

kb_information¶

Returns KB information.

Returns:	the information
Return type:	float

kb_mean_information¶

Returns KB mean information.

Returns:	the information
Return type:	float

kb_relative_information¶

Returns KB relative information.

Returns:	the information
Return type:	float

matrix(title=None)¶

Generates the confusion matrix.

Parameters:	title (str) – optional title
Returns:	the matrix
Return type:	str

matthews_correlation_coefficient(class_index)¶

Returns the Matthews correlation coefficient (nominal classes).

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the coefficient
Return type:	float

mean_absolute_error¶

Returns the mean absolute error.

Returns:	the error
Return type:	float

mean_prior_absolute_error¶

Returns the mean prior absolute error.

Returns:	the error
Return type:	float

num_false_negatives(class_index)¶

Returns the number of false negatives.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the count
Return type:	float

num_false_positives(class_index)¶

Returns the number of false positives.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the count
Return type:	float

num_instances¶

Returns the number of instances that had a known class value.

Returns:	the number of instances
Return type:	float

num_true_negatives(class_index)¶

Returns the number of true negatives.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the count
Return type:	float

num_true_positives(class_index)¶

Returns the number of true positives.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the count
Return type:	float

percent_correct¶

Returns the percent correct (nominal classes).

Returns:	the percentage
Return type:	float

percent_incorrect¶

Returns the percent incorrect (nominal classes).

Returns:	the percentage
Return type:	float

percent_unclassified¶

Returns the percent unclassified.

Returns:	the percentage
Return type:	float

precision(class_index)¶

Returns the precision.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the precision
Return type:	float

predictions¶

Returns the predictions.

Returns:	the predictions. None if not available
Return type:	list

recall(class_index)¶

Returns the recall.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the recall
Return type:	float

relative_absolute_error¶

Returns the relative absolute error.

Returns:	the error
Return type:	float

root_mean_prior_squared_error¶

Returns the root mean prior squared error.

Returns:	the error
Return type:	float

root_mean_squared_error¶

Returns the root mean squared error.

Returns:	the error
Return type:	float

root_relative_squared_error¶

Returns the root relative squared error.

Returns:	the error
Return type:	float

sf_entropy_gain¶

Returns the total SF, which is the null model entropy minus the scheme entropy.

Returns:	the gain
Return type:	float

sf_mean_entropy_gain¶

Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.

Returns:	the gain
Return type:	float

sf_mean_prior_entropy¶

Returns the entropy per instance for the null model.

Returns:	the entropy
Return type:	float

sf_mean_scheme_entropy¶

Returns the entropy per instance for the scheme.

Returns:	the entropy
Return type:	float

sf_prior_entropy¶

Returns the total entropy for the null model.

Returns:	the entropy
Return type:	float

sf_scheme_entropy¶

Returns the total entropy for the scheme.

Returns:	the entropy
Return type:	float

size_of_predicted_regions¶

Returns the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.

:return:the size of the regions :rtype: float

summary(title=None, complexity=False)¶

Generates a summary.

Parameters:	title (str) – optional title complexity (bool) – whether to print the complexity information as well
Returns:	the summary
Return type:	str

test_model(classifier, data, output=None)¶

Evaluates the built model using the specified test data and returns the classifications.

Parameters:	classifier (Classifier) – the trained classifier to evaluate data (Instances) – the data to evaluate on output (PredictionOutput) – the output generator to use
Returns:	the classifications
Return type:	ndarray

test_model_once(classifier, inst)¶

Evaluates the built model using the specified test instance and returns the classification.

Parameters:	classifier (Classifier) – the classifier to cross-validate inst (Instances) – the Instance to evaluate on
Returns:	the classification
Return type:	float

total_cost¶

Returns the total cost.

Returns:	the cost
Return type:	float

true_negative_rate(class_index)¶

Returns the true negative rate.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the rate
Return type:	float

true_positive_rate(class_index)¶

Returns the true positive rate.

Parameters:	class_index (int) – the 0-based index of the class label
Returns:	the rate
Return type:	float

unclassified¶

Returns the unclassified count.

Returns:	the count
Return type:	float

unweighted_macro_f_measure¶

Returns the unweighted macro-averaged F-measure.

Returns:	the measure
Return type:	float

unweighted_micro_f_measure¶

Returns the unweighted micro-averaged F-measure.

Returns:	the measure
Return type:	float

weighted_area_under_prc¶

Returns the weighted area under precision recall curve.

Returns:	the weighted area
Return type:	float

weighted_area_under_roc¶

Returns the weighted area under receiver operator characteristic curve.

Returns:	the weighted area
Return type:	float

weighted_f_measure¶

Returns the weighted f measure.

Returns:	the measure
Return type:	float

weighted_false_negative_rate¶

Returns the weighted false negative rate.

Returns:	the rate
Return type:	float

weighted_false_positive_rate¶

Returns the weighted false positive rate.

Returns:	the rate
Return type:	float

weighted_matthews_correlation¶

Returns the weighted Matthews correlation (nominal classes).

Returns:	the correlation
Return type:	float

weighted_precision¶

Returns the weighted precision.

Returns:	the precision
Return type:	float

weighted_recall¶

Returns the weighted recall.

Returns:	the recall
Return type:	float

weighted_true_negative_rate¶

Returns the weighted true negative rate.

Returns:	the rate
Return type:	float

weighted_true_positive_rate¶

Returns the weighted true positive rate.

Returns:	the rate
Return type:	float

class weka.classifiers.FilteredClassifier(jobject=None, options=None)¶

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the filtered classifier.

check_for_modified_class_attribute(check)¶

Sets whether to check for class attribute modifications.

Parameters:	check (bool) – True if checking for modifications

filter¶

Returns the filter.

Returns:	the filter in use
Return type:	Filter

class weka.classifiers.GridSearch(jobject=None, options=None)¶

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the GridSearch meta-classifier.

best¶

Returns the best classifier setup found during the th search.

Returns:	the best classifier setup
Return type:	Classifier

evaluation¶

Returns the currently set statistic used for evaluation.

Returns:	the statistic
Return type:	SelectedTag

x¶

Returns a dictionary with all the current values for the X of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:	the dictionary with the parameters
Return type:	dict

y¶

Returns a dictionary with all the current values for the Y of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str

Returns:	the dictionary with the parameters
Return type:	dict

class weka.classifiers.Kernel(classname=None, jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for kernels.

build_kernel(data)¶

Builds the classifier with the data.

Parameters:	data (Instances) – the data to train the classifier with

capabilities()¶

Returns the capabilities of the classifier.

Returns:	the capabilities
Return type:	Capabilities

checks_turned_off¶

Returns whether checks are turned off.

Returns:	True if checks turned off
Return type:	bool

clean()¶: Frees the memory used by the kernel.

eval(id1, id2, inst1)¶

Computes the result of the kernel function for two instances. If id1 == -1, eval use inst1 instead of an instance in the dataset.

Parameters:	id1 (int) – the index of the first instance in the dataset id2 (int) – the index of the second instance in the dataset inst1 (Instance) – the instance corresponding to id1 (used if id1 == -1)

classmethod make_copy(kernel)¶

Creates a copy of the kernel.

Parameters:	kernel (Kernel) – the kernel to copy
Returns:	the copy of the kernel
Return type:	Kernel

class weka.classifiers.KernelClassifier(classname=None, jobject=None, options=None)¶

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that have a kernel property, like SMO.

kernel¶

Returns the current kernel.

Returns:	the kernel or None if none found
Return type:	Kernel

class weka.classifiers.MultiSearch(jobject=None, options=None)¶

Bases: weka.classifiers.SingleClassifierEnhancer

Wrapper class for the MultiSearch meta-classifier. NB: ‘multi-search-weka-package’ must be installed (https://github.com/fracpete/multisearch-weka-package), version 2016.1.15 or later.

best¶

Returns the best classifier setup found during the th search.

Returns:	the best classifier setup
Return type:	Classifier

evaluation¶

Returns the currently set statistic used for evaluation.

Returns:	the statistic
Return type:	SelectedTag

parameters¶

Returns the list of currently set search parameters.

Returns:	the list of AbstractSearchParameter objects
Return type:	list

class weka.classifiers.MultipleClassifiersCombiner(classname=None, jobject=None, options=None)¶

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that use a multiple base classifiers.

classifiers¶

Returns the list of base classifiers.

Returns:	the classifier list
Return type:	list

class weka.classifiers.NominalPrediction(jobject)¶

Bases: weka.classifiers.Prediction

Wrapper class for a nominal prediction.

distribution¶

Returns the class distribution.

Returns:	the class distribution list
Return type:	ndarray

margin¶

Returns the margin.

Returns:	the margin
Return type:	float

class weka.classifiers.NumericPrediction(jobject)¶

Bases: weka.classifiers.Prediction

Wrapper class for a numeric prediction.

error¶

Returns the error.

Returns:	the error
Return type:	float

prediction_intervals¶

Returns the prediction intervals.

Returns:	the intervals
Return type:	ndarray

class weka.classifiers.Prediction(jobject)¶

Bases: weka.core.classes.JavaObject

Wrapper class for a prediction.

actual¶

Returns the actual value.

Returns:	the actual value (internal representation)
Return type:	float

predicted¶

Returns the predicted value.

Returns:	the predicted value (internal representation)
Return type:	float

weight¶

Returns the weight.

Returns:	the weight of the Instance that was used
Return type:	float

class weka.classifiers.PredictionOutput(classname='weka.classifiers.evaluation.output.prediction.PlainText', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

For collecting predictions and generating output from. Must be derived from weka.classifiers.evaluation.output.prediction.AbstractOutput

buffer_content()¶

Returns the content of the buffer as string.

Returns:	The buffer content
Return type:	str

header¶

Returns the header format.

Returns:	The dataset format
Return type:	Instances

print_all(cls, data)¶

Prints the header, classifications and footer to the buffer.

Parameters:	cls (Classifier) – the classifier data (Instances) – the test data

print_classification(cls, inst, index)¶

Prints the classification to the buffer.

Parameters:	cls (Classifier) – the classifier inst (Instance) – the test instance index (int) – the 0-based index of the test instance

print_classifications(cls, data)¶

Prints the classifications to the buffer.

Parameters:	cls (Classifier) – the classifier data (Instances) – the test data

print_footer()¶: Prints the footer to the buffer.

print_header()¶: Prints the header to the buffer.

class weka.classifiers.SingleClassifierEnhancer(classname=None, jobject=None, options=None)¶

Bases: weka.classifiers.Classifier

Wrapper class for classifiers that use a single base classifier.

classifier¶

Returns the base classifier.

;return: the base classifier :rtype: Classifier

weka.classifiers.main(args=None)¶

Runs a classifier from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:	args (list) – the command-line arguments to use, uses sys.argv if None

weka.classifiers.predictions_to_instances(data, preds)¶

Turns the predictions turned into an Instances object.

Parameters:	data (Instances) – the original dataset format preds (list) – the predictions to convert
Returns:	the predictions, None if no predictions present
Return type:	Instances

weka.classifiers.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:	0 for success, 1 for failure.
Return type:	int

weka.clusterers module¶

class weka.clusterers.ClusterEvaluation¶

Bases: weka.core.classes.JavaObject

Evaluation class for clusterers.

classes_to_clusters¶

Return the array (ordered by cluster number) of minimum error class to cluster mappings.

Returns:	the mappings
Return type:	ndarray

cluster_assignments¶

Return an array of cluster assignments corresponding to the most recent set of instances clustered.

Returns:	the cluster assignments
Return type:	ndarray

cluster_results¶

The cluster results as string.

Returns:	the results string
Return type:	str

classmethod crossvalidate_model(clusterer, data, num_folds, rnd)¶

Cross-validates the clusterer and returns the loglikelihood.

Parameters:	clusterer (Clusterer) – the clusterer instance to evaluate data (Instances) – the data to evaluate on num_folds (int) – the number of folds rnd (Random) – the random number generator to use
Returns:	the cross-validated loglikelihood
Return type:	float

classmethod evaluate_clusterer(clusterer, args)¶

Evaluates the clusterer with the given options.

Parameters:	clusterer (Clusterer) – the clusterer instance to evaluate args (list) – the command-line arguments
Returns:	the evaluation result
Return type:	str

log_likelihood¶

Returns the log likelihood.

Returns:	the log likelihood
Return type:	float

num_clusters¶

Returns the number of clusters.

Returns:	the number of clusters
Return type:	int

set_model(clusterer)¶

Sets the built clusterer to evaluate.

Parameters:	clusterer (Clusterer) – the clusterer to evaluate

test_model(test)¶

Evaluates the currently set clusterer on the test set.

Parameters:	test (Instances) – the test set to use for evaluating

class weka.clusterers.Clusterer(classname='weka.clusterers.SimpleKMeans', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for clusterers.

build_clusterer(data)¶

Builds the clusterer with the data.

Parameters:	data (Instances) – the data to use for training the clusterer

capabilities¶

Returns the capabilities of the clusterer.

Returns:	the capabilities
Return type:	Capabilities

cluster_instance(inst)¶

Peforms a prediction.

Parameters:	inst (Instance) – the instance to determine the cluster for
Returns:	the clustering result
Return type:	float

classmethod deserialize(ser_file)¶

Deserializes a clusterer from a file.

Parameters:	ser_file (str) – the model file to deserialize
Returns:	model and, if available, the dataset header
Return type:	tuple

distribution_for_instance(inst)¶

Peforms a prediction, returning the cluster distribution.

Parameters:	inst (Instance) – the Instance to get the cluster distribution for
Returns:	the cluster distribution
Return type:	float[]

graph¶

Returns the graph if classifier implements weka.core.Drawable, otherwise None.

Returns:	the graph or None if not available
Return type:	str

graph_type¶

Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.

Returns:	the type
Return type:	int

classmethod make_copy(clusterer)¶

Creates a copy of the clusterer.

Parameters:	clusterer (Clusterer) – the clustererto copy
Returns:	the copy of the clusterer
Return type:	Clusterer

number_of_clusters¶

Returns the number of clusters found.

Returns:	the number fo clusters
Return type:	int

serialize(ser_file, header=None)¶

Serializes the clusterer to the specified file.

Parameters:	ser_file (str) – the file to save the model to header (Instances) – the (optional) dataset header to store alongside; recommended

update_clusterer(inst)¶

Updates the clusterer with the instance.

Parameters:	inst (Instance) – the Instance to update the clusterer with

update_finished()¶: Signals the clusterer that updating with new data has finished.

class weka.clusterers.FilteredClusterer(jobject=None, options=None)¶

Bases: weka.clusterers.SingleClustererEnhancer

Wrapper class for the filtered clusterer.

filter¶

Returns the filter.

Returns:	the filter
Return type:	Filter

class weka.clusterers.SingleClustererEnhancer(classname=None, jobject=None, options=None)¶

Bases: weka.clusterers.Clusterer

Wrapper class for clusterers that use a single base clusterer.

clusterer¶

Returns the base clusterer.

Returns:	the clusterer
Return type:	Clusterer

weka.clusterers.main(args=None)¶

Runs a clusterer from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:	args (list) – the command-line arguments to use, uses sys.argv if None

weka.clusterers.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:	0 for success, 1 for failure.
Return type:	int

weka.datagenerators module¶

class weka.datagenerators.DataGenerator(classname='weka.datagenerators.classifiers.classification.Agrawal', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for datagenerators.

dataset_format¶

Returns the dataset format.

Returns:	the format
Return type:	Instances

define_data_format()¶

Returns the data format.

Returns:	the data format
Return type:	Instances

generate_example()¶

Returns a single Instance.

Returns:	the next example
Return type:	Instance

generate_examples()¶

Returns complete dataset.

Returns:	the generated dataset
Return type:	Instances

generate_finish()¶

Returns a “finish” string.

Returns:	a finish comment
Return type:	str

generate_start()¶

Returns a “start” string.

Returns:	the start comment
Return type:	str

classmethod make_copy(generator)¶

Creates a copy of the generator.

Parameters:	generator (DataGenerator) – the generator to copy
Returns:	the copy of the generator
Return type:	DataGenerator

classmethod make_data(generator, args)¶

Generates data using the generator and commandline arguments.

Parameters:	generator (DataGenerator) – the generator instance to use args (list) – the command-line arguments

num_examples_act¶

Returns a actual number of examples to generate.

Returns:	the number of examples
Return type:	int

single_mode_flag¶

Returns whether data is generated row by row (True) or in one go (False).

Returns:	whether incremental
Return type:	bool

weka.datagenerators.main(args=None)¶

Runs a datagenerator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:	args (list) – the command-line arguments to use, uses sys.argv if None

weka.datagenerators.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:	0 for success, 1 for failure.
Return type:	int

weka.experiments module¶

class weka.experiments.Experiment(classname='weka.experiment.Experiment', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for an experiment.

class weka.experiments.ResultMatrix(classname='weka.experiment.ResultMatrixPlainText', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

For generating results from an Experiment run.

average(col)¶

Returns the average mean at this location (if valid location).

Parameters:	col (int) – the 0-based column index
Returns:	the mean
Return type:	float

columns¶

Returns the column count.

Returns:	the count
Return type:	int

get_col_name(index)¶

Returns the column name.

Parameters:	index (int) – the 0-based row index
Returns:	the column name, None if invalid index
Return type:	str

get_mean(col, row)¶

Returns the mean at this location (if valid location).

Parameters:	col (int) – the 0-based column index row (int) – the 0-based row index
Returns:	the mean
Return type:	float

get_row_name(index)¶

Returns the row name.

Parameters:	index (int) – the 0-based row index
Returns:	the row name, None if invalid index
Return type:	str

get_stdev(col, row)¶

Returns the standard deviation at this location (if valid location).

Parameters:	col (int) – the 0-based column index row (int) – the 0-based row index
Returns:	the standard deviation
Return type:	float

hide_col(index)¶

Hides the column.

Parameters:	index (int) – the 0-based column index

hide_row(index)¶

Hides the row.

Parameters:	index (int) – the 0-based row index

is_col_hidden(index)¶

Returns whether the column is hidden.

Parameters:	index (int) – the 0-based column index
Returns:	true if hidden
Return type:	bool

is_row_hidden(index)¶

Returns whether the row is hidden.

Parameters:	index (int) – the 0-based row index
Returns:	true if hidden
Return type:	bool

rows¶

Returns the row count.

Returns:	the count
Return type:	int

set_col_name(index, name)¶

Sets the column name.

Parameters:	index (int) – the 0-based row index name (str) – the name of the column

set_mean(col, row, mean)¶

Sets the mean at this location (if valid location).

Parameters:	col (int) – the 0-based column index row (int) – the 0-based row index mean (float) – the mean to set

set_row_name(index, name)¶

Sets the row name.

Parameters:	index (int) – the 0-based row index name (str) – the name of the row

set_stdev(col, row, stdev)¶

Sets the standard deviation at this location (if valid location).

Parameters:	col (int) – the 0-based column index row (int) – the 0-based row index stdev (float) – the standard deviation to set

show_col(index)¶

Shows the column.

Parameters:	index (int) – the 0-based column index

show_row(index)¶

Shows the row.

Parameters:	index (int) – the 0-based row index

to_string_header()¶

Returns the header of the matrix as a string.

Returns:	the header
Return type:	str

to_string_key()¶

Returns a key for all the col names, for better readability if the names got cut off.

Returns:	the key
Return type:	str

to_string_matrix()¶

Returns the matrix as a string.

Returns:	the generated output
Return type:	str

to_string_ranking()¶

Returns the ranking in a string representation.

Returns:	the ranking
Return type:	str

to_string_summary()¶

returns the summary as string.

Returns:	the summary
Return type:	str

class weka.experiments.SimpleCrossValidationExperiment(datasets, classifiers, classification=True, runs=10, folds=10, result=None)¶

Bases: weka.experiments.SimpleExperiment

Performs a simple cross-validation experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()¶

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:	producer and property path
Return type:	tuple

class weka.experiments.SimpleExperiment(datasets, classifiers, jobject=None, classification=True, runs=10, result=None)¶

Bases: weka.core.classes.OptionHandler

Ancestor for simple experiments.

See following URL for how to use the Experiment API: http://weka.wikispaces.com/Using+the+Experiment+API

configure_resultproducer()¶

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:	producer and property path
Return type:	tuple

configure_splitevaluator()¶

Configures and returns the SplitEvaluator and Classifier instance as tuple.

Returns:	evaluator and classifier
Return type:	tuple

experiment()¶

Returns the internal experiment, if set up, otherwise None.

Returns:	the internal experiment
Return type:	Experiment

classmethod load(filename)¶

Loads the experiment from disk.

Parameters:	filename (str) – the filename of the experiment to load
Returns:	the experiment
Return type:	Experiment

run()¶: Executes the experiment.

classmethod save(filename, experiment)¶

Saves the experiment to disk.

Parameters:	filename (str) – the filename to save the experiment to experiment (Experiment) – the Experiment to save

setup()¶: Initializes the experiment.

class weka.experiments.SimpleRandomSplitExperiment(datasets, classifiers, classification=True, runs=10, percentage=66.6, preserve_order=False, result=None)¶

Bases: weka.experiments.SimpleExperiment

Performs a simple random split experiment. Can output the results either in ARFF or CSV.

configure_resultproducer()¶

Configures and returns the ResultProducer and PropertyPath as tuple.

Returns:	producer and property path
Return type:	tuple

class weka.experiments.Tester(classname='weka.experiment.PairedCorrectedTTester', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

For generating statistical results from an experiment.

dataset_columns¶

Returns the list of column names that identify uniquely a dataset.

Returns:	the list of attributes names
Return type:	list

fold_column¶

Returns the column name that holds the Fold number.

Returns:	the attribute name
Return type:	str

header(comparison_column)¶

Creates a “header” string describing the current resultsets.

Parameters:	comparison_column (int) – the index of the column to compare against
Returns:	the header
Return type:	str

init_columns()¶: Sets the column indices based on the supplied names if necessary.

instances¶

Returns the data used in the analysis.

Returns:	the data in use
Return type:	Instances

multi_resultset_full(base_resultset, comparison_column)¶

Creates a comparison table where a base resultset is compared to the other resultsets.

Parameters:	base_resultset (int) – the 0-based index of the base resultset (eg classifier to compare against) comparison_column (int) – the 0-based index of the column to compare against
Returns:	the comparison
Return type:	str

multi_resultset_ranking(comparison_column)¶

Creates a ranking.

Parameters:	comparison_column (int) – the 0-based index of the column to compare against
Returns:	the ranking
Return type:	str

multi_resultset_summary(comparison_column)¶

Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.

Parameters:	comparison_column (int) – the 0-based index of the column to compare against
Returns:	the summary
Return type:	str

result_columns¶

Returns the list of column names that identify uniquely a result (eg classifier + options + ID).

Returns:	the list of attribute names
Return type:	list

resultmatrix¶

Returns the ResultMatrix instance in use.

Returns:	the matrix in use
Return type:	ResultMatrix

run_column¶

Returns the column name that holds the Run number.

Returns:	the attribute name
Return type:	str

weka.filters module¶

class weka.filters.Filter(classname='weka.filters.AllFilter', jobject=None, options=None)¶

Bases: weka.core.classes.OptionHandler

Wrapper class for filters.

batch_finished()¶

Signals the filter that the batch of data has finished.

Returns:	True if instances can be collected from the output
Return type:	bool

capabilities()¶

Returns the capabilities of the filter.

Returns:	the capabilities
Return type:	Capabilities

classmethod deserialize(ser_file)¶

Deserializes a filter from a file.

Parameters:	ser_file (str) – the file to deserialize from
Returns:	model
Return type:	Filter

filter(data)¶

Filters the dataset(s). When providing a list, this can be used to create compatible train/test sets, since the filter only gets initialized with the first dataset and all subsequent datasets get transformed using the same setup.

NB: inputformat(Instances) must have been called beforehand.

Parameters:	data (Instances or list of Instances) – the Instances to filter
Returns:	the filtered Instances object(s)
Return type:	Instances or list of Instances

input(inst)¶

Inputs the Instance.

Parameters:	inst (Instance) – the instance to filter
Returns:	True if filtered can be collected from output
Return type:	bool

inputformat(data)¶

Sets the input format.

Parameters:	data (Instances) – the data to use as input

classmethod make_copy(flter)¶

Creates a copy of the filter.

Parameters:	flter (Filter) – the filter to copy
Returns:	the copy of the filter
Return type:	Filter

output()¶

Outputs the filtered Instance.

Returns:	the filtered instance
Return type:	an Instance object

outputformat()¶

Returns the output format.

Returns:	the output format
Return type:	Instances

serialize(ser_file)¶

Serializes the filter to the specified file.

Parameters:	ser_file (str) – the file to save the filter to

to_source(classname, data)¶

Returns the model as Java source code if the classifier implements weka.filters.Sourcable.

Parameters:	classname (str) – the classname for the generated Java code data (Instances) – the dataset used for initializing the filter
Returns:	the model as source code string
Return type:	str

class weka.filters.MultiFilter(jobject=None, options=None)¶

Bases: weka.filters.Filter

Wrapper class for weka.filters.MultiFilter.

filters¶

Returns the list of base filters.

Returns:	the filter list
Return type:	list

class weka.filters.StringToWordVector(jobject=None, options=None)¶

Bases: weka.filters.Filter

Wrapper class for weka.filters.unsupervised.attribute.StringToWordVector.

stemmer¶

Returns the stemmer.

Returns:	the stemmer
Return type:	Stemmer

stopwords¶

Returns the stopwords handler.

Returns:	the stopwords handler
Return type:	Stopwords

tokenizer¶

Returns the tokenizer.

Returns:	the tokenizer
Return type:	Tokenizer

weka.filters.main(args=None)¶

Runs a filter from the command-line. Calls JVM start/stop automatically. Use -h to see all options.

Parameters:	args (list) – the command-line arguments to use, uses sys.argv if None

weka.filters.sys_main()¶

Runs the main function using the system cli arguments, and returns a system error code.

Returns:	0 for success, 1 for failure.
Return type:	int

weka package¶

Subpackages¶

Submodules¶

weka.associations module¶

weka.attribute_selection module¶

weka.classifiers module¶

weka.clusterers module¶

weka.datagenerators module¶

weka.experiments module¶

weka.filters module¶

Module contents¶

Table Of Contents

Previous topic

Next topic

This Page