weka package¶
Subpackages¶
- weka.core package
- Submodules
- weka.core.capabilities module
- weka.core.classes module
- weka.core.converters module
- weka.core.database module
- weka.core.dataset module
- weka.core.jvm module
- weka.core.packages module
- weka.core.serialization module
- weka.core.stemmers module
- weka.core.stopwords module
- weka.core.tokenizers module
- weka.core.types module
- weka.core.version module
- Module contents
- weka.flow package
- weka.plot package
Submodules¶
weka.associations module¶
-
class
weka.associations.AssociationRule(jobject)¶ Bases:
weka.core.classes.JavaObjectWrapper for weka.associations.AssociationRule class.
-
consequence¶ Get the the consequence.
Returns: the consequence, list of Item objects Return type: list
-
consequence_support¶ Get the support for the consequence.
Returns: the support Return type: int
-
metric_names¶ Returns the metric names for the rule.
Returns: the metric names Return type: list
-
metric_value(name)¶ Returns the named metric value for the rule.
Parameters: name (str) – the name of the metric Returns: the metric value Return type: float
-
metric_values¶ Returns the metric values for the rule.
Returns: the metric values Return type: ndarray
-
premise¶ Get the the premise.
Returns: the premise, list of Item objects Return type: list
-
premise_support¶ Get the support for the premise.
Returns: the support Return type: int
-
primary_metric_name¶ Returns the primary metric name for the rule.
Returns: the metric name Return type: str
-
primary_metric_value¶ Returns the primary metric value for the rule.
Returns: the metric value Return type: float
-
total_support¶ Get the total support.
Returns: the support Return type: int
-
total_transactions¶ Get the total transactions.
Returns: the transactions Return type: int
-
-
class
weka.associations.AssociationRules(jobject)¶ Bases:
weka.core.classes.JavaObjectWrapper for weka.associations.AssociationRules class.
-
producer¶ Returns a string describing the producer that generated these rules.
Returns: the producer Return type: str
-
-
class
weka.associations.AssociationRulesIterator(rules)¶ Bases:
objectIterator for weka.associations.AssociationRules class.
-
next()¶ Returns the next rule.
Returns: the next rule object Return type: AssociationRule
-
-
class
weka.associations.Associator(classname=None, jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for associators.
-
association_rules()¶ Returns association rules that were generated. Only if implements AssociationRulesProducer.
Returns: the association rules that were generated Return type: AssociationRules
-
build_associations(data)¶ Builds the associator with the data.
Parameters: data (Instances) – the data to train the associator with
-
can_produce_rules()¶ Checks whether association rules can be generated.
Returns: whether scheme implements AssociationRulesProducer interface and association rules can be generated :rtype: bool
-
capabilities¶ Returns the capabilities of the associator.
Returns: the capabilities Return type: Capabilities
-
classmethod
make_copy(associator)¶ Creates a copy of the clusterer.
Parameters: associator (Associator) – the associator to copy Returns: the copy of the associator Return type: Associator
-
rule_metric_names¶ Returns the rule metric names of the association rules. Only if implements AssociationRulesProducer.
Returns: the metric names Return type: list
-
-
class
weka.associations.Item(jobject)¶ Bases:
weka.core.classes.JavaObjectWrapper for weka.associations.Item class.
-
comparison¶ Returns the comparison operator as string.
Returns: the comparison iterator Return type: str
-
decrease_frequency(frequency=None)¶ Decreases the frequency.
Parameters: frequency (int) – the frequency to decrease by, 1 if None
-
frequency¶ Returns the frequency.
Returns: the frequency Return type: int
-
increase_frequency(frequency=None)¶ Increases the frequency.
Parameters: frequency (int) – the frequency to increase by, 1 if None
-
item_value¶ Returns the item value as string.
Returns: the item value Return type: str
-
-
weka.associations.main(args=None)¶ Runs a associator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
Parameters: args (list) – the command-line arguments to use, uses sys.argv if None
-
weka.associations.sys_main()¶ Runs the main function using the system cli arguments, and returns a system error code.
Returns: 0 for success, 1 for failure. Return type: int
weka.attribute_selection module¶
-
class
weka.attribute_selection.ASEvaluation(classname='weka.attributeSelection.CfsSubsetEval', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for attribute selection evaluation algorithm.
-
build_evaluator(data)¶ Builds the evaluator with the data.
Parameters: data (Instances) – the data to use
-
capabilities¶ Returns the capabilities of the classifier.
Returns: the capabilities Return type: Capabilities
-
post_process(indices)¶ Post-processes the evaluator with the selected attribute indices.
Parameters: indices (ndarray) – the attribute indices list to use Returns: the processed indices Return type: ndarray
-
-
class
weka.attribute_selection.ASSearch(classname='weka.attributeSelection.BestFirst', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for attribute selection search algorithm.
-
search(evaluation, data)¶ Performs the search and returns the indices of the selected attributes.
Parameters: - evaluation (ASEvaluation) – the evaluation algorithm to use
- data (Instances) – the data to use
Returns: the selected attributes (0-based indices)
Return type: ndarray
-
-
class
weka.attribute_selection.AttributeSelection¶ Bases:
weka.core.classes.JavaObjectPerforms attribute selection using search and evaluation algorithms.
-
classmethod
attribute_selection(evaluator, args)¶ Performs attribute selection using the given attribute evaluator and options.
Parameters: - evaluator (ASEvaluation) – the evaluator to use
- args (list) – the command-line args for the attribute selection
Returns: the results string
Return type: str
-
crossvalidation(crossvalidation)¶ Sets whether to perform cross-validation.
Parameters: crossvalidation (bool) – whether to perform cross-validation
-
cv_results¶ Generates a results string from the last cross-validation attribute selection.
Returns: the results string Return type: str
-
evaluator(evaluator)¶ Sets the evaluator to use.
Parameters: evaluator (ASEvaluation) – the evaluator to use.
-
folds(folds)¶ Sets the number of folds to use for cross-validation.
Parameters: folds (int) – the number of folds
-
number_attributes_selected¶ Returns the number of attributes that were selected.
Returns: the number of attributes Return type: int
-
ranked_attributes¶ Returns the matrix of ranked attributes from the last run.
Returns: the Numpy matrix Return type: ndarray
-
ranking(ranking)¶ Sets whether to perform a ranking, if possible.
Parameters: ranking (bool) – whether to perform a ranking
-
reduce_dimensionality(data)¶ Reduces the dimensionality of the provided Instance or Instances object.
Parameters: data (Instances) – the data to process Returns: the reduced dataset Return type: Instances
-
results_string¶ Generates a results string from the last attribute selection.
Returns: the results string Return type: str
-
search(search)¶ Sets the search algorithm to use.
Parameters: search (ASSearch) – the search algorithm
-
seed(seed)¶ Sets the seed for cross-validation.
Parameters: seed (int) – the seed value
-
select_attributes(instances)¶ Performs attribute selection on the given dataset.
Parameters: instances (Instances) – the data to process
-
select_attributes_cv_split(instances)¶ Performs attribute selection on the given cross-validation split.
Parameters: instances (Instances) – the data to process
-
selected_attributes¶ Returns the selected attributes from the last run.
Returns: the Numpy array of 0-based indices Return type: ndarray
-
classmethod
-
weka.attribute_selection.main(args=None)¶ Runs attribute selection from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
Parameters: args (list) – the command-line arguments to use, uses sys.argv if None
-
weka.attribute_selection.sys_main()¶ Runs the main function using the system cli arguments, and returns a system error code.
Returns: 0 for success, 1 for failure. Return type: int
weka.classifiers module¶
-
class
weka.classifiers.Classifier(classname='weka.classifiers.rules.ZeroR', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for classifiers.
-
batch_size¶ Returns the batch size, in case this classifier is a batch predictor.
Returns: the batch size, None if not a batch predictor Return type: str
-
build_classifier(data)¶ Builds the classifier with the data.
Parameters: data (Instances) – the data to train the classifier with
-
capabilities¶ Returns the capabilities of the classifier.
Returns: the capabilities Return type: Capabilities
-
classify_instance(inst)¶ Peforms a prediction.
Parameters: inst (Instance) – the Instance to get a prediction for Returns: the classification (either regression value or 0-based label index) Return type: float
-
classmethod
deserialize(ser_file)¶ Deserializes a classifier from a file.
Parameters: ser_file (str) – the model file to deserialize Returns: model and, if available, the dataset header Return type: tuple
-
distribution_for_instance(inst)¶ Peforms a prediction, returning the class distribution.
Parameters: inst (Instance) – the Instance to get the class distribution for Returns: the class distribution array Return type: ndarray
-
distributions_for_instances(data)¶ Peforms predictions, returning the class distributions.
Parameters: data (Instances) – the Instances to get the class distributions for Returns: the class distribution matrix, None if not a batch predictor Return type: ndarray
-
graph¶ Returns the graph if classifier implements weka.core.Drawable, otherwise None.
Returns: the generated graph string Return type: str
-
graph_type¶ Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.
Returns: the type Return type: int
-
has_efficient_batch_prediction()¶ Returns whether the classifier implements a more efficient batch prediction.
Returns: True if a more efficient batch prediction is implemented, always False if not batch predictor Return type: bool
-
classmethod
make_copy(classifier)¶ Creates a copy of the classifier.
Parameters: classifier (Classifier) – the classifier to copy Returns: the copy of the classifier Return type: Classifier
-
serialize(ser_file, header=None)¶ Serializes the classifier to the specified file.
Parameters: - ser_file (str) – the file to save the model to
- header (Instances) – the (optional) dataset header to store alongside; recommended
-
to_source(classname)¶ Returns the model as Java source code if the classifier implements weka.classifiers.Sourcable.
Parameters: classname (str) – the classname for the generated Java code Returns: the model as source code string Return type: str
-
-
class
weka.classifiers.CostMatrix(matrx=None, num_classes=None)¶ Bases:
weka.core.classes.JavaObjectClass for storing and manipulating a misclassification cost matrix. The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i. Cost values can be fixed or computed on a per-instance basis (cost sensitive evaluation only) from the value of an attribute or an expression involving attribute(s).
-
apply_cost_matrix(data, rnd)¶ Applies the cost matrix to the data.
Parameters:
-
expected_costs(class_probs, inst=None)¶ Calculates the expected misclassification cost for each possible class value, given class probability estimates.
Parameters: class_probs (ndarray) – the class probabilities Returns: the calculated costs Return type: ndarray
-
get_cell(row, col)¶ Returns the JB_Object at the specified location.
Parameters: - row (int) – the 0-based index of the row
- col (int) – the 0-based index of the column
Returns: the object in that cell
Return type: JB_Object
-
get_element(row, col, inst=None)¶ Returns the value at the specified location.
Parameters: - row (int) – the 0-based index of the row
- col (int) – the 0-based index of the column
- inst (Instance) – the Instace
Returns: the value in that cell
Return type: float
-
get_max_cost(class_value, inst=None)¶ Gets the maximum cost for a particular class value.
Parameters: - class_value (int) – the class value to get the maximum cost for
- inst (Instance) – the Instance
Returns: the cost
Return type: float
-
initialize()¶ Initializes the matrix.
-
normalize()¶ Normalizes the matrix.
-
num_columns¶ Returns the number of columns.
Returns: the number of columns Return type: int
-
num_rows¶ Returns the number of rows.
Returns: the number of rows Return type: int
-
classmethod
parse_matlab(matlab)¶ Parses the costmatrix definition in matlab format and returns a matrix.
Parameters: matlab (str) – the matlab matrix string, eg [1 2; 3 4]. Returns: the generated matrix Return type: CostMatrix
-
set_cell(row, col, obj)¶ Sets the JB_Object at the specified location. Automatically unwraps JavaObject.
Parameters: - row (int) – the 0-based index of the row
- col (int) – the 0-based index of the column
- obj (object) – the object for that cell
-
set_element(row, col, value)¶ Sets the float value at the specified location.
Parameters: - row (int) – the 0-based index of the row
- col (int) – the 0-based index of the column
- value (float) – the float value for that cell
-
size¶ Returns the number of rows/columns.
Returns: the number of rows/columns Return type: int
-
to_matlab()¶ Returns the matrix in Matlab format.
Returns: the matrix as Matlab formatted string Return type: str
-
-
class
weka.classifiers.Evaluation(data, cost_matrix=None)¶ Bases:
weka.core.classes.JavaObjectEvaluation class for classifiers.
-
area_under_prc(class_index)¶ Returns the area under precision recall curve.
Parameters: class_index (int) – the 0-based index of the class label Returns: the area Return type: float
-
area_under_roc(class_index)¶ Returns the area under receiver operators characteristics curve.
Parameters: class_index (int) – the 0-based index of the class label Returns: the area Return type: float
-
avg_cost¶ Returns the average cost.
Returns: the cost Return type: float
-
class_details(title=None)¶ Generates the class details.
Parameters: title (str) – optional title Returns: the details Return type: str
-
class_priors¶ Returns the class priors.
Returns: the priors Return type: ndarray
-
confusion_matrix¶ Returns the confusion matrix.
Returns: the matrix Return type: ndarray
-
correct¶ Returns the correct count (nominal classes).
Returns: the count Return type: float
-
correlation_coefficient¶ Returns the correlation coefficient (numeric classes).
Returns: the coefficient Return type: float
-
coverage_of_test_cases_by_predicted_regions¶ Returns the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.
Returns: the coverage Return type: float
-
crossvalidate_model(classifier, data, num_folds, rnd, output=None)¶ Crossvalidates the model using the specified data, number of folds and random number generator wrapper.
Parameters: - classifier (Classifier) – the classifier to cross-validate
- data (Instances) – the data to evaluate on
- num_folds (int) – the number of folds
- rnd (Random) – the random number generator to use
- output (PredictionOutput) – the output generator to use
-
cumulative_margin_distribution()¶ Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.
Returns: the cumulative margin distribution Return type: str
-
discard_predictions¶ Returns whether to discard predictions (saves memory).
Returns: True if to discard Return type: bool
-
error_rate¶ Returns the error rate (numeric classes).
Returns: the rate Return type: float
-
classmethod
evaluate_model(classifier, args)¶ Evaluates the classifier with the given options.
Parameters: - classifier (Classifier) – the classifier instance to use
- args (list) – the command-line arguments to use
Returns: the evaluation string
Return type: str
-
evaluate_train_test_split(classifier, data, percentage, rnd=None, output=None)¶ Splits the data into train and test, builds the classifier with the training data and evaluates it against the test set.
Parameters: - classifier (Classifier) – the classifier to cross-validate
- data (Instances) – the data to evaluate on
- percentage (double) – the percentage split to use (amount to use for training)
- rnd (Random) – the random number generator to use, if None the order gets preserved
- output (PredictionOutput) – the output generator to use
-
f_measure(class_index)¶ Returns the f measure.
Parameters: class_index (int) – the 0-based index of the class label Returns: the measure Return type: float
-
false_negative_rate(class_index)¶ Returns the false negative rate.
Parameters: class_index (int) – the 0-based index of the class label Returns: the rate Return type: float
-
false_positive_rate(class_index)¶ Returns the false positive rate.
Parameters: class_index (int) – the 0-based index of the class label Returns: the rate Return type: float
-
incorrect¶ Returns the incorrect count (nominal classes).
Returns: the count Return type: float
-
kappa¶ Returns kappa.
Returns: kappa Return type: float
-
kb_information¶ Returns KB information.
Returns: the information Return type: float
-
kb_mean_information¶ Returns KB mean information.
Returns: the information Return type: float
-
kb_relative_information¶ Returns KB relative information.
Returns: the information Return type: float
-
matrix(title=None)¶ Generates the confusion matrix.
Parameters: title (str) – optional title Returns: the matrix Return type: str
-
matthews_correlation_coefficient(class_index)¶ Returns the Matthews correlation coefficient (nominal classes).
Parameters: class_index (int) – the 0-based index of the class label Returns: the coefficient Return type: float
-
mean_absolute_error¶ Returns the mean absolute error.
Returns: the error Return type: float
-
mean_prior_absolute_error¶ Returns the mean prior absolute error.
Returns: the error Return type: float
-
num_false_negatives(class_index)¶ Returns the number of false negatives.
Parameters: class_index (int) – the 0-based index of the class label Returns: the count Return type: float
-
num_false_positives(class_index)¶ Returns the number of false positives.
Parameters: class_index (int) – the 0-based index of the class label Returns: the count Return type: float
-
num_instances¶ Returns the number of instances that had a known class value.
Returns: the number of instances Return type: float
-
num_true_negatives(class_index)¶ Returns the number of true negatives.
Parameters: class_index (int) – the 0-based index of the class label Returns: the count Return type: float
-
num_true_positives(class_index)¶ Returns the number of true positives.
Parameters: class_index (int) – the 0-based index of the class label Returns: the count Return type: float
-
percent_correct¶ Returns the percent correct (nominal classes).
Returns: the percentage Return type: float
-
percent_incorrect¶ Returns the percent incorrect (nominal classes).
Returns: the percentage Return type: float
-
percent_unclassified¶ Returns the percent unclassified.
Returns: the percentage Return type: float
-
precision(class_index)¶ Returns the precision.
Parameters: class_index (int) – the 0-based index of the class label Returns: the precision Return type: float
-
predictions¶ Returns the predictions.
Returns: the predictions. None if not available Return type: list
-
recall(class_index)¶ Returns the recall.
Parameters: class_index (int) – the 0-based index of the class label Returns: the recall Return type: float
-
relative_absolute_error¶ Returns the relative absolute error.
Returns: the error Return type: float
-
root_mean_prior_squared_error¶ Returns the root mean prior squared error.
Returns: the error Return type: float
-
root_mean_squared_error¶ Returns the root mean squared error.
Returns: the error Return type: float
-
root_relative_squared_error¶ Returns the root relative squared error.
Returns: the error Return type: float
-
sf_entropy_gain¶ Returns the total SF, which is the null model entropy minus the scheme entropy.
Returns: the gain Return type: float
-
sf_mean_entropy_gain¶ Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.
Returns: the gain Return type: float
-
sf_mean_prior_entropy¶ Returns the entropy per instance for the null model.
Returns: the entropy Return type: float
-
sf_mean_scheme_entropy¶ Returns the entropy per instance for the scheme.
Returns: the entropy Return type: float
-
sf_prior_entropy¶ Returns the total entropy for the null model.
Returns: the entropy Return type: float
-
sf_scheme_entropy¶ Returns the total entropy for the scheme.
Returns: the entropy Return type: float
-
size_of_predicted_regions¶ Returns the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.
:return:the size of the regions :rtype: float
-
summary(title=None, complexity=False)¶ Generates a summary.
Parameters: - title (str) – optional title
- complexity (bool) – whether to print the complexity information as well
Returns: the summary
Return type: str
-
test_model(classifier, data, output=None)¶ Evaluates the built model using the specified test data and returns the classifications.
Parameters: - classifier (Classifier) – the trained classifier to evaluate
- data (Instances) – the data to evaluate on
- output (PredictionOutput) – the output generator to use
Returns: the classifications
Return type: ndarray
-
test_model_once(classifier, inst)¶ Evaluates the built model using the specified test instance and returns the classification.
Parameters: - classifier (Classifier) – the classifier to cross-validate
- inst (Instances) – the Instance to evaluate on
Returns: the classification
Return type: float
-
total_cost¶ Returns the total cost.
Returns: the cost Return type: float
-
true_negative_rate(class_index)¶ Returns the true negative rate.
Parameters: class_index (int) – the 0-based index of the class label Returns: the rate Return type: float
-
true_positive_rate(class_index)¶ Returns the true positive rate.
Parameters: class_index (int) – the 0-based index of the class label Returns: the rate Return type: float
-
unclassified¶ Returns the unclassified count.
Returns: the count Return type: float
-
unweighted_macro_f_measure¶ Returns the unweighted macro-averaged F-measure.
Returns: the measure Return type: float
-
unweighted_micro_f_measure¶ Returns the unweighted micro-averaged F-measure.
Returns: the measure Return type: float
-
weighted_area_under_prc¶ Returns the weighted area under precision recall curve.
Returns: the weighted area Return type: float
-
weighted_area_under_roc¶ Returns the weighted area under receiver operator characteristic curve.
Returns: the weighted area Return type: float
-
weighted_f_measure¶ Returns the weighted f measure.
Returns: the measure Return type: float
-
weighted_false_negative_rate¶ Returns the weighted false negative rate.
Returns: the rate Return type: float
-
weighted_false_positive_rate¶ Returns the weighted false positive rate.
Returns: the rate Return type: float
-
weighted_matthews_correlation¶ Returns the weighted Matthews correlation (nominal classes).
Returns: the correlation Return type: float
-
weighted_precision¶ Returns the weighted precision.
Returns: the precision Return type: float
-
weighted_recall¶ Returns the weighted recall.
Returns: the recall Return type: float
-
weighted_true_negative_rate¶ Returns the weighted true negative rate.
Returns: the rate Return type: float
-
weighted_true_positive_rate¶ Returns the weighted true positive rate.
Returns: the rate Return type: float
-
-
class
weka.classifiers.FilteredClassifier(jobject=None, options=None)¶ Bases:
weka.classifiers.SingleClassifierEnhancerWrapper class for the filtered classifier.
-
check_for_modified_class_attribute(check)¶ Sets whether to check for class attribute modifications.
Parameters: check (bool) – True if checking for modifications
-
-
class
weka.classifiers.GridSearch(jobject=None, options=None)¶ Bases:
weka.classifiers.SingleClassifierEnhancerWrapper class for the GridSearch meta-classifier.
-
best¶ Returns the best classifier setup found during the th search.
Returns: the best classifier setup Return type: Classifier
-
evaluation¶ Returns the currently set statistic used for evaluation.
Returns: the statistic Return type: SelectedTag
-
x¶ Returns a dictionary with all the current values for the X of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str
Returns: the dictionary with the parameters Return type: dict
-
y¶ Returns a dictionary with all the current values for the Y of the grid. Keys for the dictionary: property, min, max, step, base, expression Types: property=str, min=float, max=float, step=float, base=float, expression=str
Returns: the dictionary with the parameters Return type: dict
-
-
class
weka.classifiers.Kernel(classname=None, jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for kernels.
-
build_kernel(data)¶ Builds the classifier with the data.
Parameters: data (Instances) – the data to train the classifier with
-
capabilities()¶ Returns the capabilities of the classifier.
Returns: the capabilities Return type: Capabilities
-
checks_turned_off¶ Returns whether checks are turned off.
Returns: True if checks turned off Return type: bool
-
clean()¶ Frees the memory used by the kernel.
-
eval(id1, id2, inst1)¶ Computes the result of the kernel function for two instances. If id1 == -1, eval use inst1 instead of an instance in the dataset.
Parameters: - id1 (int) – the index of the first instance in the dataset
- id2 (int) – the index of the second instance in the dataset
- inst1 (Instance) – the instance corresponding to id1 (used if id1 == -1)
-
-
class
weka.classifiers.KernelClassifier(classname=None, jobject=None, options=None)¶ Bases:
weka.classifiers.ClassifierWrapper class for classifiers that have a kernel property, like SMO.
-
class
weka.classifiers.MultiSearch(jobject=None, options=None)¶ Bases:
weka.classifiers.SingleClassifierEnhancerWrapper class for the MultiSearch meta-classifier. NB: ‘multi-search-weka-package’ must be installed (https://github.com/fracpete/multisearch-weka-package), version 2016.1.15 or later.
-
best¶ Returns the best classifier setup found during the th search.
Returns: the best classifier setup Return type: Classifier
-
evaluation¶ Returns the currently set statistic used for evaluation.
Returns: the statistic Return type: SelectedTag
-
parameters¶ Returns the list of currently set search parameters.
Returns: the list of AbstractSearchParameter objects Return type: list
-
-
class
weka.classifiers.MultipleClassifiersCombiner(classname=None, jobject=None, options=None)¶ Bases:
weka.classifiers.ClassifierWrapper class for classifiers that use a multiple base classifiers.
-
classifiers¶ Returns the list of base classifiers.
Returns: the classifier list Return type: list
-
-
class
weka.classifiers.NominalPrediction(jobject)¶ Bases:
weka.classifiers.PredictionWrapper class for a nominal prediction.
-
distribution¶ Returns the class distribution.
Returns: the class distribution list Return type: ndarray
-
margin¶ Returns the margin.
Returns: the margin Return type: float
-
-
class
weka.classifiers.NumericPrediction(jobject)¶ Bases:
weka.classifiers.PredictionWrapper class for a numeric prediction.
-
error¶ Returns the error.
Returns: the error Return type: float
-
prediction_intervals¶ Returns the prediction intervals.
Returns: the intervals Return type: ndarray
-
-
class
weka.classifiers.Prediction(jobject)¶ Bases:
weka.core.classes.JavaObjectWrapper class for a prediction.
-
actual¶ Returns the actual value.
Returns: the actual value (internal representation) Return type: float
-
predicted¶ Returns the predicted value.
Returns: the predicted value (internal representation) Return type: float
-
weight¶ Returns the weight.
Returns: the weight of the Instance that was used Return type: float
-
-
class
weka.classifiers.PredictionOutput(classname='weka.classifiers.evaluation.output.prediction.PlainText', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerFor collecting predictions and generating output from. Must be derived from weka.classifiers.evaluation.output.prediction.AbstractOutput
-
buffer_content()¶ Returns the content of the buffer as string.
Returns: The buffer content Return type: str
-
print_all(cls, data)¶ Prints the header, classifications and footer to the buffer.
Parameters: - cls (Classifier) – the classifier
- data (Instances) – the test data
-
print_classification(cls, inst, index)¶ Prints the classification to the buffer.
Parameters: - cls (Classifier) – the classifier
- inst (Instance) – the test instance
- index (int) – the 0-based index of the test instance
-
print_classifications(cls, data)¶ Prints the classifications to the buffer.
Parameters: - cls (Classifier) – the classifier
- data (Instances) – the test data
Prints the footer to the buffer.
-
print_header()¶ Prints the header to the buffer.
-
-
class
weka.classifiers.SingleClassifierEnhancer(classname=None, jobject=None, options=None)¶ Bases:
weka.classifiers.ClassifierWrapper class for classifiers that use a single base classifier.
-
classifier¶ Returns the base classifier.
;return: the base classifier :rtype: Classifier
-
-
weka.classifiers.main(args=None)¶ Runs a classifier from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
Parameters: args (list) – the command-line arguments to use, uses sys.argv if None
-
weka.classifiers.predictions_to_instances(data, preds)¶ Turns the predictions turned into an Instances object.
Parameters: - data (Instances) – the original dataset format
- preds (list) – the predictions to convert
Returns: the predictions, None if no predictions present
Return type:
-
weka.classifiers.sys_main()¶ Runs the main function using the system cli arguments, and returns a system error code.
Returns: 0 for success, 1 for failure. Return type: int
weka.clusterers module¶
-
class
weka.clusterers.ClusterEvaluation¶ Bases:
weka.core.classes.JavaObjectEvaluation class for clusterers.
-
classes_to_clusters¶ Return the array (ordered by cluster number) of minimum error class to cluster mappings.
Returns: the mappings Return type: ndarray
-
cluster_assignments¶ Return an array of cluster assignments corresponding to the most recent set of instances clustered.
Returns: the cluster assignments Return type: ndarray
-
cluster_results¶ The cluster results as string.
Returns: the results string Return type: str
-
classmethod
crossvalidate_model(clusterer, data, num_folds, rnd)¶ Cross-validates the clusterer and returns the loglikelihood.
Parameters: Returns: the cross-validated loglikelihood
Return type: float
-
classmethod
evaluate_clusterer(clusterer, args)¶ Evaluates the clusterer with the given options.
Parameters: - clusterer (Clusterer) – the clusterer instance to evaluate
- args (list) – the command-line arguments
Returns: the evaluation result
Return type: str
-
log_likelihood¶ Returns the log likelihood.
Returns: the log likelihood Return type: float
-
num_clusters¶ Returns the number of clusters.
Returns: the number of clusters Return type: int
-
-
class
weka.clusterers.Clusterer(classname='weka.clusterers.SimpleKMeans', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for clusterers.
-
build_clusterer(data)¶ Builds the clusterer with the data.
Parameters: data (Instances) – the data to use for training the clusterer
-
capabilities¶ Returns the capabilities of the clusterer.
Returns: the capabilities Return type: Capabilities
-
cluster_instance(inst)¶ Peforms a prediction.
Parameters: inst (Instance) – the instance to determine the cluster for Returns: the clustering result Return type: float
-
classmethod
deserialize(ser_file)¶ Deserializes a clusterer from a file.
Parameters: ser_file (str) – the model file to deserialize Returns: model and, if available, the dataset header Return type: tuple
-
distribution_for_instance(inst)¶ Peforms a prediction, returning the cluster distribution.
Parameters: inst (Instance) – the Instance to get the cluster distribution for Returns: the cluster distribution Return type: float[]
-
graph¶ Returns the graph if classifier implements weka.core.Drawable, otherwise None.
Returns: the graph or None if not available Return type: str
-
graph_type¶ Returns the graph type if classifier implements weka.core.Drawable, otherwise -1.
Returns: the type Return type: int
-
classmethod
make_copy(clusterer)¶ Creates a copy of the clusterer.
Parameters: clusterer (Clusterer) – the clustererto copy Returns: the copy of the clusterer Return type: Clusterer
-
number_of_clusters¶ Returns the number of clusters found.
Returns: the number fo clusters Return type: int
-
serialize(ser_file, header=None)¶ Serializes the clusterer to the specified file.
Parameters: - ser_file (str) – the file to save the model to
- header (Instances) – the (optional) dataset header to store alongside; recommended
-
update_clusterer(inst)¶ Updates the clusterer with the instance.
Parameters: inst (Instance) – the Instance to update the clusterer with
-
update_finished()¶ Signals the clusterer that updating with new data has finished.
-
-
class
weka.clusterers.FilteredClusterer(jobject=None, options=None)¶ Bases:
weka.clusterers.SingleClustererEnhancerWrapper class for the filtered clusterer.
-
class
weka.clusterers.SingleClustererEnhancer(classname=None, jobject=None, options=None)¶ Bases:
weka.clusterers.ClustererWrapper class for clusterers that use a single base clusterer.
-
weka.clusterers.main(args=None)¶ Runs a clusterer from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
Parameters: args (list) – the command-line arguments to use, uses sys.argv if None
-
weka.clusterers.sys_main()¶ Runs the main function using the system cli arguments, and returns a system error code.
Returns: 0 for success, 1 for failure. Return type: int
weka.datagenerators module¶
-
class
weka.datagenerators.DataGenerator(classname='weka.datagenerators.classifiers.classification.Agrawal', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for datagenerators.
-
generate_finish()¶ Returns a “finish” string.
Returns: a finish comment Return type: str
-
generate_start()¶ Returns a “start” string.
Returns: the start comment Return type: str
-
classmethod
make_copy(generator)¶ Creates a copy of the generator.
Parameters: generator (DataGenerator) – the generator to copy Returns: the copy of the generator Return type: DataGenerator
-
classmethod
make_data(generator, args)¶ Generates data using the generator and commandline arguments.
Parameters: - generator (DataGenerator) – the generator instance to use
- args (list) – the command-line arguments
-
num_examples_act¶ Returns a actual number of examples to generate.
Returns: the number of examples Return type: int
-
single_mode_flag¶ Returns whether data is generated row by row (True) or in one go (False).
Returns: whether incremental Return type: bool
-
-
weka.datagenerators.main(args=None)¶ Runs a datagenerator from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
Parameters: args (list) – the command-line arguments to use, uses sys.argv if None
-
weka.datagenerators.sys_main()¶ Runs the main function using the system cli arguments, and returns a system error code.
Returns: 0 for success, 1 for failure. Return type: int
weka.experiments module¶
-
class
weka.experiments.Experiment(classname='weka.experiment.Experiment', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for an experiment.
-
class
weka.experiments.ResultMatrix(classname='weka.experiment.ResultMatrixPlainText', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerFor generating results from an Experiment run.
-
average(col)¶ Returns the average mean at this location (if valid location).
Parameters: col (int) – the 0-based column index Returns: the mean Return type: float
-
columns¶ Returns the column count.
Returns: the count Return type: int
-
get_col_name(index)¶ Returns the column name.
Parameters: index (int) – the 0-based row index Returns: the column name, None if invalid index Return type: str
-
get_mean(col, row)¶ Returns the mean at this location (if valid location).
Parameters: - col (int) – the 0-based column index
- row (int) – the 0-based row index
Returns: the mean
Return type: float
-
get_row_name(index)¶ Returns the row name.
Parameters: index (int) – the 0-based row index Returns: the row name, None if invalid index Return type: str
-
get_stdev(col, row)¶ Returns the standard deviation at this location (if valid location).
Parameters: - col (int) – the 0-based column index
- row (int) – the 0-based row index
Returns: the standard deviation
Return type: float
-
hide_col(index)¶ Hides the column.
Parameters: index (int) – the 0-based column index
-
hide_row(index)¶ Hides the row.
Parameters: index (int) – the 0-based row index
Returns whether the column is hidden.
Parameters: index (int) – the 0-based column index Returns: true if hidden Return type: bool
Returns whether the row is hidden.
Parameters: index (int) – the 0-based row index Returns: true if hidden Return type: bool
-
rows¶ Returns the row count.
Returns: the count Return type: int
-
set_col_name(index, name)¶ Sets the column name.
Parameters: - index (int) – the 0-based row index
- name (str) – the name of the column
-
set_mean(col, row, mean)¶ Sets the mean at this location (if valid location).
Parameters: - col (int) – the 0-based column index
- row (int) – the 0-based row index
- mean (float) – the mean to set
-
set_row_name(index, name)¶ Sets the row name.
Parameters: - index (int) – the 0-based row index
- name (str) – the name of the row
-
set_stdev(col, row, stdev)¶ Sets the standard deviation at this location (if valid location).
Parameters: - col (int) – the 0-based column index
- row (int) – the 0-based row index
- stdev (float) – the standard deviation to set
-
show_col(index)¶ Shows the column.
Parameters: index (int) – the 0-based column index
-
show_row(index)¶ Shows the row.
Parameters: index (int) – the 0-based row index
-
to_string_header()¶ Returns the header of the matrix as a string.
Returns: the header Return type: str
-
to_string_key()¶ Returns a key for all the col names, for better readability if the names got cut off.
Returns: the key Return type: str
-
to_string_matrix()¶ Returns the matrix as a string.
Returns: the generated output Return type: str
-
to_string_ranking()¶ Returns the ranking in a string representation.
Returns: the ranking Return type: str
-
to_string_summary()¶ returns the summary as string.
Returns: the summary Return type: str
-
-
class
weka.experiments.SimpleCrossValidationExperiment(datasets, classifiers, classification=True, runs=10, folds=10, result=None)¶ Bases:
weka.experiments.SimpleExperimentPerforms a simple cross-validation experiment. Can output the results either in ARFF or CSV.
-
configure_resultproducer()¶ Configures and returns the ResultProducer and PropertyPath as tuple.
Returns: producer and property path Return type: tuple
-
-
class
weka.experiments.SimpleExperiment(datasets, classifiers, jobject=None, classification=True, runs=10, result=None)¶ Bases:
weka.core.classes.OptionHandlerAncestor for simple experiments.
See following URL for how to use the Experiment API: http://weka.wikispaces.com/Using+the+Experiment+API
-
configure_resultproducer()¶ Configures and returns the ResultProducer and PropertyPath as tuple.
Returns: producer and property path Return type: tuple
-
configure_splitevaluator()¶ Configures and returns the SplitEvaluator and Classifier instance as tuple.
Returns: evaluator and classifier Return type: tuple
-
experiment()¶ Returns the internal experiment, if set up, otherwise None.
Returns: the internal experiment Return type: Experiment
-
classmethod
load(filename)¶ Loads the experiment from disk.
Parameters: filename (str) – the filename of the experiment to load Returns: the experiment Return type: Experiment
-
run()¶ Executes the experiment.
-
classmethod
save(filename, experiment)¶ Saves the experiment to disk.
Parameters: - filename (str) – the filename to save the experiment to
- experiment (Experiment) – the Experiment to save
-
setup()¶ Initializes the experiment.
-
-
class
weka.experiments.SimpleRandomSplitExperiment(datasets, classifiers, classification=True, runs=10, percentage=66.6, preserve_order=False, result=None)¶ Bases:
weka.experiments.SimpleExperimentPerforms a simple random split experiment. Can output the results either in ARFF or CSV.
-
configure_resultproducer()¶ Configures and returns the ResultProducer and PropertyPath as tuple.
Returns: producer and property path Return type: tuple
-
-
class
weka.experiments.Tester(classname='weka.experiment.PairedCorrectedTTester', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerFor generating statistical results from an experiment.
-
dataset_columns¶ Returns the list of column names that identify uniquely a dataset.
Returns: the list of attributes names Return type: list
-
fold_column¶ Returns the column name that holds the Fold number.
Returns: the attribute name Return type: str
-
header(comparison_column)¶ Creates a “header” string describing the current resultsets.
Parameters: comparison_column (int) – the index of the column to compare against Returns: the header Return type: str
-
init_columns()¶ Sets the column indices based on the supplied names if necessary.
-
multi_resultset_full(base_resultset, comparison_column)¶ Creates a comparison table where a base resultset is compared to the other resultsets.
Parameters: - base_resultset (int) – the 0-based index of the base resultset (eg classifier to compare against)
- comparison_column (int) – the 0-based index of the column to compare against
Returns: the comparison
Return type: str
-
multi_resultset_ranking(comparison_column)¶ Creates a ranking.
Parameters: comparison_column (int) – the 0-based index of the column to compare against Returns: the ranking Return type: str
-
multi_resultset_summary(comparison_column)¶ Carries out a comparison between all resultsets, counting the number of datsets where one resultset outperforms the other.
Parameters: comparison_column (int) – the 0-based index of the column to compare against Returns: the summary Return type: str
-
result_columns¶ Returns the list of column names that identify uniquely a result (eg classifier + options + ID).
Returns: the list of attribute names Return type: list
-
resultmatrix¶ Returns the ResultMatrix instance in use.
Returns: the matrix in use Return type: ResultMatrix
-
run_column¶ Returns the column name that holds the Run number.
Returns: the attribute name Return type: str
-
weka.filters module¶
-
class
weka.filters.Filter(classname='weka.filters.AllFilter', jobject=None, options=None)¶ Bases:
weka.core.classes.OptionHandlerWrapper class for filters.
-
batch_finished()¶ Signals the filter that the batch of data has finished.
Returns: True if instances can be collected from the output Return type: bool
-
capabilities()¶ Returns the capabilities of the filter.
Returns: the capabilities Return type: Capabilities
-
classmethod
deserialize(ser_file)¶ Deserializes a filter from a file.
Parameters: ser_file (str) – the file to deserialize from Returns: model Return type: Filter
-
filter(data)¶ Filters the dataset(s). When providing a list, this can be used to create compatible train/test sets, since the filter only gets initialized with the first dataset and all subsequent datasets get transformed using the same setup.
NB: inputformat(Instances) must have been called beforehand.
Parameters: data (Instances or list of Instances) – the Instances to filter Returns: the filtered Instances object(s) Return type: Instances or list of Instances
-
input(inst)¶ Inputs the Instance.
Parameters: inst (Instance) – the instance to filter Returns: True if filtered can be collected from output Return type: bool
-
classmethod
make_copy(flter)¶ Creates a copy of the filter.
Parameters: flter (Filter) – the filter to copy Returns: the copy of the filter Return type: Filter
-
output()¶ Outputs the filtered Instance.
Returns: the filtered instance Return type: an Instance object
-
serialize(ser_file)¶ Serializes the filter to the specified file.
Parameters: ser_file (str) – the file to save the filter to
-
to_source(classname, data)¶ Returns the model as Java source code if the classifier implements weka.filters.Sourcable.
Parameters: - classname (str) – the classname for the generated Java code
- data (Instances) – the dataset used for initializing the filter
Returns: the model as source code string
Return type: str
-
-
class
weka.filters.MultiFilter(jobject=None, options=None)¶ Bases:
weka.filters.FilterWrapper class for weka.filters.MultiFilter.
-
filters¶ Returns the list of base filters.
Returns: the filter list Return type: list
-
-
class
weka.filters.StringToWordVector(jobject=None, options=None)¶ Bases:
weka.filters.FilterWrapper class for weka.filters.unsupervised.attribute.StringToWordVector.
-
weka.filters.main(args=None)¶ Runs a filter from the command-line. Calls JVM start/stop automatically. Use -h to see all options.
Parameters: args (list) – the command-line arguments to use, uses sys.argv if None
-
weka.filters.sys_main()¶ Runs the main function using the system cli arguments, and returns a system error code.
Returns: 0 for success, 1 for failure. Return type: int