Examples¶

The following examples are meant to be executed in sequence, as they rely on previous steps, e.g., on data present.

For more examples, check out the example repository on github:

github.com/fracpete/python-weka-wrapper3-examples

Start up JVM¶

import weka.core.jvm as jvm
jvm.start()

If you want to use the classpath environment variable and all currently installed Weka packages, use the following call:

jvm.start(system_cp=True, packages=True)

In case your Weka home directory is not located in wekafiles in your user’s home directory, then you have two options for specifying the alternative location: use the WEKA_HOME environment variable or the packages parameter, supplying a directory. The latter is shown below:

jvm.start(packages="/my/packages/are/somwhere/else")

Most of the times, you will want to increase the maximum heap size available to the JVM. The following example reserves 512 MB:

jvm.start(max_heap_size="512m")

If you want to print system information at start up time, then you can use the system_info parameter:

jvm.start(system_info=True)

This will output key-value pairs generated by Weka’s weka.core.SystemInfo class, similar to this:

DEBUG:weka.core.jvm:System info:
DEBUG:weka.core.jvm:java.runtime.name=OpenJDK Runtime Environment
DEBUG:weka.core.jvm:java.awt.headless=true
...
DEBUG:weka.core.jvm:java.vm.compressedOopsMode=Zero based
DEBUG:weka.core.jvm:java.vm.specification.version=11

For more information, check out the help of the jvm module:

help(jvm.start)
help(jvm.stop)

Location of the datasets¶

The following examples assume the datasets to be present in the data_dir directory. For instance, this could be the following directory:

data_dir = "/my/datasets/"

Load dataset and print it¶

from weka.core.converters import Loader
loader = Loader(classname="weka.core.converters.ArffLoader")
data = loader.load_file(data_dir + "iris.arff")
data.class_is_last()

print(data)

The weka.core.converters module has a convenience method for loading datasets called load_any_file. This method determines a loader based on the file extension and then loads the full dataset:

import weka.core.converters as converters
data = converters.load_any_file(data_dir + "iris.arff")
data.class_is_last()

print(data)

It is also possible to define the class attribute when loading:

data = loader.load_file(data_dir + "iris.arff", class_index="last")
data = converters.load_any_file(data_dir + "iris.arff", class_index="last")

The following strings are supported:

first
second
third
last-2 (third to last)
last-1 (second to last)
last
any other string gets interpreted as 1-based index

Create dataset manually¶

The following code snippet defines the dataset structure by creating its attributes and then the dataset itself. Once the weka.core.dataset.Instances object is available, rows (i.e., weka.core.dataset.Instance objects) can be added.

from weka.core.dataset import Attribute, Instance, Instances

# create attributes
num_att = Attribute.create_numeric("num")
date_att = Attribute.create_date("dat", "yyyy-MM-dd")
nom_att = Attribute.create_nominal("nom", ["label1", "label2"])

# create dataset
dataset = Instances.create_instances("helloworld", [num_att, date_att, nom_att], 0)

# add rows
values = [3.1415926, date_att.parse_date("2014-04-10"), 1.0]
inst = Instance.create_instance(values)
dataset.add_instance(inst)

values = [2.71828, date_att.parse_date("2014-08-09"), Instance.missing_value()]
inst = Instance.create_instance(values)
dataset.add_instance(inst)

print(dataset)

Create dataset from lists¶

If your data is easily available as lists, you can also construct datasets using this approach (custom column names can be supplied via cols_x and col_y):

from weka.core.dataset import create_instances_from_lists
from random import randint

# pure numeric
x = [[randint(1, 10) for _ in range(5)] for _ in range(10)]
y = [randint(0, 1) for _ in range(10)]
dataset = create_instances_from_lists(x, y, name="generated from lists")
print(dataset)

dataset = create_instances_from_lists(x, name="generated from lists (no y)")
print(dataset)

# mixed data types
x = [["TEXT", 1, 1.1], ["XXX", 2, 2.2]]
y = ["A", "B"]
dataset = create_instances_from_lists(x, y, name="generated from mixed lists", cols_x=["text", "integer", "float"], col_y="class")
print(dataset)

Create dataset from matrices¶

Another way of constructing a dataset is to use numpy matrices/arrays, e.g., obtained from a Panda data frame (custom column names can be supplied via cols_x and col_y):

from weka.core.dataset import create_instances_from_matrices
import numpy as np

# pure numeric
x = np.random.randn(10, 5)
y = np.random.randn(10)
dataset = create_instances_from_matrices(x, y, name="generated from matrices")
print(dataset)

dataset = create_instances_from_matrices(x, name="generated from matrix (no y)")
print(dataset)

# mixed data types
x = np.array([("TEXT", 1, 1.1), ("XXX", 2, 2.2)], dtype='S20, i4, f8')
y = np.array(["A", "B"], dtype='S20')
dataset = create_instances_from_matrices(x, y, name="generated from mixed matrices", cols_x=["text", "integer", "float"], col_y="class")
print(dataset)

Dataset subsets¶

Transformations in Weka usually occur by applying filters (see section Filters below). However, quite often one only wants to quickly create a subset (of colunms or rows) from a dataset. For this purpose, the subset method of the weka.core.dataset.Instances method can be used (it uses filters under the hood to generate the actual subset):

from weka.core.converters import load_any_file

data = load_any_file("/some/where/iris.arff")
print(data.attribute_names(), data.num_instances)

# select columns by name
subset = data.subset(col_names=['sepallength', 'sepalwidth', 'petallength', 'petalwidth'])
print(subset.attribute_names(), subset.num_instances)

# select columns by range (1-based indices)
subset = data.subset(col_range='1-3,5')
print(subset.attribute_names(), subset.num_instances)

# select rows by range (1-based indices)
subset = data.subset(row_range='51-150')
print(subset.attribute_names(), subset.num_instances)

# invert selection of cols/rows and keep original relation name
subset = data.subset(col_range='5', invert_cols=True, row_range='51-150', invert_rows=True, keep_relationame=True)
print(subset.attribute_names(), subset.num_instances)

Data generators¶

Artifical data can be generated using one of Weka’s data generators, e.g., the Agrawal classification generator:

from weka.datagenerators import DataGenerator
generator = DataGenerator(classname="weka.datagenerators.classifiers.classification.Agrawal", options=["-B", "-P", "0.05"])
DataGenerator.make_data(generator, ["-o", "/some/where/outputfile.arff"])

Or using the low-level API (outputting data to stdout):

generator = DataGenerator(classname="weka.datagenerators.classifiers.classification.Agrawal", options=["-n", "10", "-r", "agrawal"])
generator.dataset_format = generator.define_data_format()
print(generator.dataset_format)
if generator.single_mode_flag:
    for i in range(generator.num_examples_act):
        print(generator.generate_example())
else:
    print(generator.generate_examples())

Filters¶

The Filter class from the weka.filters module allows you to filter datasets, e.g., removing the last attribute using the Remove filter:

data = loader.load_file(data_dir + "vote.arff")

from weka.filters import Filter
remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "last"])
remove.inputformat(data)
filtered = remove.filter(data)

print(filtered)

Output help from underlying OptionHandler¶

If the underlying Java class implements the weka.core.OptionHandler method, then you can use the to_help() method to generate a string containing the globalInfo() and listOptions() information:

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48")
print(cls.to_help())

Option handling¶

Any class derived from OptionHandler (module weka.core.classes) allows getting and setting of the options via the property options. Depending on the sub-class, you may also provide the options already when instantiating the class. The following two examples instantiate a J48 classifier, one using the options property and the other using the shortcut through the constructor:

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48")
cls.options = ["-C", "0.3"]

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])

You can use the options property also to retrieve the currently set options:

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
print(cls.options)

Using the to_commandline() method, you can return a single string that contains classname and options, just like Weka’s Explorer does when copying the setup to the clipboard:

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
print(cls.to_commandline())

The to_commandline(…) method of the weka.core.classes module generates the command-line string for any class that implements the weka.core.OptionHandler Java interface under the hood (a lot of classes do!):

from weka.classifiers import Classifier
from weka.core.classes import to_commandline
cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
print(to_commandline(cls))

The reverse, generating an object from a command-line, is done via the from_commandline(…) method:

cmdline = 'weka.classifiers.functions.SMO -K "weka.classifiers.functions.supportVector.NormalizedPolyKernel -E 3.0"'
classifier = from_commandline(cmdline, classname="weka.classifiers.Classifier")

Build classifier on dataset, output predictions¶

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
cls.build_classifier(data)

for index, inst in enumerate(data):
    pred = cls.classify_instance(inst)
    dist = cls.distribution_for_instance(inst)
    print(str(index+1) + ": label index=" + str(pred) + ", class distribution=" + str(dist))

Build classifier on dataset, print model and draw graph¶

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
cls.build_classifier(data)

print(cls)

import weka.plot.graph as graph  # NB: pygraphviz and PIL are required
graph.plot_dot_graph(cls.graph)

Build classifier incrementally with data and print model¶

loader = Loader(classname="weka.core.converters.ArffLoader")
iris_inc = loader.load_file(data_dir + "iris.arff", incremental=True)
iris_inc.class_is_last()

print(iris_inc)

cls = Classifier(classname="weka.classifiers.bayes.NaiveBayesUpdateable")
cls.build_classifier(iris_inc)
for inst in loader:
    cls.update_classifier(inst)

print(cls)

Cross-validate filtered classifier and print evaluation and display ROC¶

data = loader.load_file(data_dir + "diabetes.arff")
data.class_is_last()

from weka.filters import Filter
remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1-3"])

cls = Classifier(classname="weka.classifiers.bayes.NaiveBayes")

from weka.classifiers import FilteredClassifier
fc = FilteredClassifier()
fc.filter = remove
fc.classifier = cls

from weka.classifiers import Evaluation
from weka.core.classes import Random
evl = Evaluation(data)
evl.crossvalidate_model(fc, data, 10, Random(1))

print(evl.percent_correct)
print(evl.summary())
print(evl.class_details())

import weka.plot.classifiers as plcls  # NB: matplotlib is required
plcls.plot_roc(evl, class_index=[0, 1], wait=True)

Cross-validate regressor, display classifier errors and predictions¶

from weka.classifiers import PredictionOutput, KernelClassifier, Kernel
data = loader.load_file(data_dir + "bolts.arff")
data.class_is_last()

cls = KernelClassifier(classname="weka.classifiers.functions.SMOreg", options=["-N", "0"])
kernel = Kernel(classname="weka.classifiers.functions.supportVector.RBFKernel", options=["-G", "0.1"])
cls.kernel = kernel
pout = PredictionOutput(classname="weka.classifiers.evaluation.output.prediction.PlainText")
evl = Evaluation(data)
evl.crossvalidate_model(cls, data, 10, Random(1), pout)

print(evl.summary())
print(pout.buffer_content())

import weka.plot.classifiers as plcls  # NB: matplotlib is required
plcls.plot_classifier_errors(evl.predictions, wait=True)

Parameter optimization - property names¶

Both, GridSearch and MultiSearch, use Java Bean property names (and paths consisting of these), not command-line options in order to get/set the parameters under optimization. Using the list_property_names method of the weka.core.classes module, you can list the properties from a Java object:

from weka.core.classes import list_property_names
cls = Classifier(classname= "weka.classifiers.trees.J48")
for p in list_property_names(cls):
    print(p)

Parameter optimization - GridSearch¶

The following code optimizes the C property of SMOreg and the gamma property of its RBFKernel:

from weka.classifiers import GridSearch
grid = GridSearch(options=["-sample-size", "100.0", "-traversal", "ROW-WISE", "-num-slots", "1", "-S", "1"])
grid.evaluation = "CC"
grid.y = {"property": "kernel.gamma", "min": -3.0, "max": 3.0, "step": 1.0, "base": 10.0, "expression": "pow(BASE,I)"}
grid.x = {"property": "C", "min": -3.0, "max": 3.0, "step": 1.0, "base": 10.0, "expression": "pow(BASE,I)"}
cls = Classifier(
    classname="weka.classifiers.functions.SMOreg",
    options=["-K", "weka.classifiers.functions.supportVector.RBFKernel"])
grid.classifier = cls
grid.build_classifier(train)
print("Model:\n" + str(grid))
print("\nBest setup:\n" + grid.best.to_commandline())

NB: The gridSearch package must be installed for this to work.

Parameter optimization - MultiSearch¶

The following code optimizes the C property of SMOreg and the gamma property of its RBFKernel:

from weka.core.classes import ListParameter, MathParameter
multi = MultiSearch(options=["-S", "1"])
multi.evaluation = "CC"
mparam = MathParameter()
mparam.prop = "kernel.gamma"
mparam.minimum = -3.0
mparam.maximum = 3.0
mparam.step = 1.0
mparam.base = 10.0
mparam.expression = "pow(BASE,I)"
lparam = ListParameter()
lparam.prop = "C"
lparam.values = ["-2.0", "-1.0", "0.0", "1.0", "2.0"]
multi.parameters = [mparam, lparam]
cls = Classifier(
    classname="weka.classifiers.functions.SMOreg",
    options=["-K", "weka.classifiers.functions.supportVector.RBFKernel"])
multi.classifier = cls
multi.build_classifier(train)
print("Model:\n" + str(multi))
print("\nBest setup:\n" + multi.best.to_commandline())

NB: The multisearch-weka-package package must be installed for this to work.

Clustering¶

In the following is an example on how to build a SimpleKMeans (with 3 clusters) using a previously loaded dataset without a class attribute:

data = loader.load_file(data_dir + "vote.arff")
data.delete_last_attribute()

from weka.clusterers import Clusterer
clusterer = Clusterer(classname="weka.clusterers.SimpleKMeans", options=["-N", "3"])
clusterer.build_clusterer(data)

print(clusterer)

Once a clusterer is built, it can be used to cluster Instance objects:

# cluster the data
for inst in data:
    cl = clusterer.cluster_instance(inst)  # 0-based cluster index
    dist = clusterer.distribution_for_instance(inst)   # cluster membership distribution
    print("cluster=" + str(cl) + ", distribution=" + str(dist))

Associations¶

Associators, like Apriori, can be built and output like this:

data = loader.load_file(data_dir + "vote.arff")
data.class_is_last()

from weka.associations import Associator
associator = Associator(classname="weka.associations.Apriori", options=["-N", "9", "-I"])
associator.build_associations(data)

print(associator)

Attribute selection¶

You can perform attribute selection using, e.g., BestFirst as search algorithm and CfsSubsetEval as evaluator as follows:

data = loader.load_file(data_dir + "vote.arff")
data.class_is_last()

from weka.attribute_selection import ASSearch, ASEvaluation, AttributeSelection
search = ASSearch(classname="weka.attributeSelection.BestFirst", options=["-D", "1", "-N", "5"])
evaluator = ASEvaluation(classname="weka.attributeSelection.CfsSubsetEval", options=["-P", "1", "-E", "1"])
attsel = AttributeSelection()
attsel.search(search)
attsel.evaluator(evaluator)
attsel.select_attributes(data)

print("# attributes: " + str(attsel.number_attributes_selected))
print("attributes: " + str(attsel.selected_attributes))
print("result string:\n" + attsel.results_string)

Attribute selection is also available through meta-schemes:

classifier: weka.classifiers.AttributeSelectedClassifier
filter: weka.filters.AttributeSelection

Timeseries¶

With the timeseriesForecasting package installed and the JVM started with package support, you can perform timeseries forecasting:

airline_data = loader.load_file(data_dir + "airline.arff")
airline_train, airline_test = airline_data.train_test_split(90.0)

# configure and build
from weka.timeseries import WekaForecaster
from weka.classifiers import Classifier
forecaster = WekaForecaster()
forecaster.fields_to_forecast = ["passenger_numbers"]
forecaster.base_forecaster = Classifier(classname="weka.classifiers.functions.LinearRegression")
forecaster.fields_to_forecast = "passenger_numbers"
forecaster.build_forecaster(airline_train)

# prime
from weka.core.dataset import Instances
num_prime_instances = 12
airline_prime = Instances.copy_instances(airline_train, airline_train.num_instances - num_prime_instances, num_prime_instances)
forecaster.prime_forecaster(airline_prime)

# forecast
num_future_forecasts = airline_test.num_instances
preds = forecaster.forecast(num_future_forecasts)
print("Actual,Predicted,Error")
for i in range(num_future_forecasts):
    actual = airline_test.get_instance(i).get_value(0)
    predicted = preds[i][0].predicted
    error = actual - predicted
    print("%f,%f,%f" % (actual, predicted, error))

Serialization¶

You can easily serialize and de-serialize as well.

Here we just save a trained classifier to a file, load it again from disk and output the model:

from weka.classifiers import Classifier
classifier = ...  # previously built classifier
classifier.serialize("/some/where/out.model")
...
classifier2, _ = Classifier.deserialize("/some/where/out.model")
print(classifier2)

Weka usually saves the header of the dataset that was used for training as well (e.g., in order to determine whether test data is compatible). This is done as follows:

from weka.classifiers import Classifier
classifier = ...  # previously built Classifier
data = ... # previously loaded/generated Instances
classifier.serialize("/some/where/out.model", header=data)
...
classifier2, data2 = Classifier.deserialize("/some/where/out.model")
print(classifier2)
print(data2)

Clusterers and filters offer the serialize and deserialize methods as well. For all other serialization/deserialiation tasks, use the methods offered by the weka.core.classes module:

serialization_write(file, object)
serialization_write_all(file, [obj1, obj2, …])
serialization_read(file)
serialization_read_all(file)

Experiments¶

Experiments, like they are run in Weka’s Experimenter, can be configured and executed as well.

Here is an example for performing a cross-validated classification experiment:

datasets = [
    data_dir + "iris.arff",
    data_dir + "vote.arff",
    data_dir + "anneal.arff"
]
classifiers = [
    Classifier(classname="weka.classifiers.rules.ZeroR"),
    Classifier(classname="weka.classifiers.trees.J48"),
    Classifier(classname="weka.classifiers.trees.REPTree"),
]
result = "exp.arff"
from weka.experiments import SimpleCrossValidationExperiment
exp = SimpleCrossValidationExperiment(
    classification=True,
    runs=10,
    folds=10,
    datasets=datasets,
    classifiers=classifiers,
    result=result)
exp.setup()
exp.run()

import weka.core.converters
loader = weka.core.converters.loader_for_file(result)
data = loader.load_file(result)
from weka.experiments import Tester, ResultMatrix
matrix = ResultMatrix(classname="weka.experiment.ResultMatrixPlainText")
tester = Tester(classname="weka.experiment.PairedCorrectedTTester")
tester.resultmatrix = matrix
comparison_col = data.attribute_by_name("Percent_correct").index
tester.instances = data

print(tester.header(comparison_col))
print(tester.multi_resultset_full(0, comparison_col))
print(tester.multi_resultset_full(1, comparison_col))

Other parameters that can be supplied to the constructor of the SimpleCrossValidationExperiment or SimpleRandomSplitExperiment classes:

class_for_ir_statistics - defines the class label to use for computing IR statistics like AUC
attribute_id - the 0-based index of the attribute that identifies rows
pred_target_column - for outputting the predictions and ground truth in separate columns in case of classification, e.g., for calculating confusion matrices manually afterwards

And a setup for performing regression experiments on random splits on the datasets:

from weka.experiments import SimpleCrossValidationExperiment, SimpleRandomSplitExperiment, Tester, ResultMatrix
from weka.classifiers import Classifier
import weka.core.converters as converters
# configure experiment
datasets = [data_dir + "bolts.arff", data_dir + "bodyfat.arff"]
classifiers = [Classifier(classname="weka.classifiers.rules.ZeroR"), Classifier(classname="weka.classifiers.functions.LinearRegression")]
outfile = "results-rs.arff"   # store results for later analysis
exp = SimpleRandomSplitExperiment(
    classification=False,
    runs=10,
    percentage=66.6,
    preserve_order=False,
    datasets=datasets,
    classifiers=classifiers,
    result=outfile)
exp.setup()
exp.run()
# evaluate previous run
loader = converters.loader_for_file(outfile)
data   = loader.load_file(outfile)
matrix = ResultMatrix(classname="weka.experiment.ResultMatrixPlainText")
tester = Tester(classname="weka.experiment.PairedCorrectedTTester")
tester.resultmatrix = matrix
comparison_col = data.attribute_by_name("Correlation_coefficient").index
tester.instances = data
print(tester.header(comparison_col))
print(tester.multi_resultset_full(0, comparison_col))

The Tester class allows you to swap columns and rows, therefore comparing datasets rather than classifiers:

tester = Tester(classname="weka.experiment.PairedCorrectedTTester")
tester.swap_rows_and_cols = True
tester.resultmatrix = matrix

Partial classnames¶

All classes derived from weka.core.classes.JavaObject like Classifier, Filter, etc., allow the use of partial classnames. So instead of instantiating a classifier like this:

cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])

You can instantiate it with a shortened classname (must start with a .):

cls = Classifier(classname=".J48", options=["-C", "0.3"])

NB: This will fail with an exception if there are no or multiple matches. For instance, the following will result in an error, as there are two Discretize filters, supervised and unsupervised:

cls = Filter(classname=".Discretize")

Exception: Found multiple matches for '.Discretize':
weka.filters.supervised.attribute.Discretize
weka.filters.unsupervised.attribute.Discretize

Packages¶

The following examples show how to list, install and uninstall an official package:

import weka.core.packages as packages
items = packages.all_packages()
for item in items:
    if item.name == "CLOPE":
        print(item.name + " " + item.url)

packages.install_package("CLOPE")
items = packages.installed_packages()
for item in items:
    print(item.name + " " + item.url)

packages.uninstall_package("CLOPE")
items = packages.installed_packages()
for item in items:
    print(item.name + " " + item.url)

You can also install unofficial packages. The following example installs a previously downloaded zip file:

import weka.core.packages as packages
success = packages.install_package("/some/where/funky-package-1.0.0.zip")
print(success)

And here installing it directly from a URL:

import weka.core.packages as packages
info = packages.install_package("http://some.server.com/funky-package-1.0.0.zip", details=True)
print(info)

Using the details=True flag, you can receive a dictionary instead of a simple boolean. This dictionary consists of:

from_repo: whether the package was installed from the repo or not (i.e., unofficial URL or local archive)
version: the version (only for packages from the repo)
error: any error that may have occurred during installation
install_message: optional message from the package maintainer on the installation
success: whether the package was installed successfully

Of course, you can also install multiple packages in one go using the install_packages method:

import weka.core.packages as packages
info = packages.install_packages([
    "http://some.server.com/funky-package-1.0.0.zip",
    "http://some.server.com/cool-package-2.0.0.zip",
    "http://some.server.com/fancy-package-1.1.0.zip",
], fail_fast=False, details=True)

This method offers the details flag as well and returns a dictionary with the package name/URL/file name as the key and the information dictionary as the value.

With the fail_fast flag you can control whether to stop the installation process as soon as the first package fails to install (fail_fast=True) or keep trying to install them (fail_fast=False).

You can include automatic installation of packages in your scripts:

import sys
import weka.core.jvm as jvm
from weka.core.packages import install_missing_package, install_missing_packages, LATEST

# installs a single package (if missing) and exits if installation occurred (outputs messages in console)
install_missing_package("CLOPE", stop_jvm_and_exit=True)

# installs any missing package, outputs messages in console, but restarting JVM is left to script
success, exit_required = install_missing_packages([("CLOPE", LATEST), ("gridSearch", LATEST), ("multisearch", LATEST)])
if exit_required:
    jvm.stop()
    sys.exit(0)

You can also output suggested Weka packages for partial class/package names or exact class names (default is partial string matching):

# suggest package for classifier 'RBFClassifier'
search = "RBFClassifier"
suggestions = packages.suggest_package(search)
print("suggested packages for " + search + ":", suggestions)

# suggest package for package '.ft.'
search = ".ft."
suggestions = packages.suggest_package(search)
print("suggested packages for " + search + ":", suggestions)

# suggest package for classifier 'weka.classifiers.trees.J48graft'
search = "weka.classifiers.trees.J48graft"
suggestions = packages.suggest_package(search, exact=True)
print("suggested packages for " + search + ":", suggestions)

Stop JVM¶

jvm.stop()

Database access¶

Thanks to JDBC (Java Database Connectivity) it is very easy to connect to SQL databases and load data as an Instances object. However, since we rely on 3rd-party libraries to achieve this, we need to specify the database JDBC driver jar when we are starting up the JVM. For instance, adding a MySQL driver called mysql-connector-java-X.Y.Z-bin.jar:

jvm.start(class_path=["/some/where/mysql-connector-java-X.Y.Z-bin.jar"])

Assuming the following parameters:

database host is dbserver

database is called mydb

database user is me

database password is verysecret

We can use the following code to select all the data from table lotsadata.

from weka.core.database import InstanceQuery
iquery = InstanceQuery()
iquery.db_url = "jdbc:mysql://dbserver:3306/mydb"
iquery.user = "me"
iquery.password = "verysecret"
iquery.query = "select * from lotsadata"
data = iquery.retrieve_instances()

Recreating environments¶

There are two approaches for recreating a python-weka-wrapper3 environment in another virtual environment or on another machine:

pww-packages freeze/install

Using the pww-packages command-line tool, you can export the currently installed Weka packages to a text file:

pww-packages freeze -r requirements.txt

If you have unofficial packages installed then it is recommended to include the URLs from which they could be obtained (according to the information stored in the packgages):

pww-packages freeze -u -r requirements.txt

In the other environment, with python-weka-wrapper3 already installed, you can then install the packages as follows:

pww-packages install -r requirements.txt

Any issues with installing packages will be output in the terminal.

pww-packages bootstrap

Using the bootstrap approach, you can generate a Python script that will install python-weka-wrapper3 and all currently installed Weka packages. Any other Python libraries you need to install yourself, which you can easily do by adapting the generated script.

You can generate this install script from the current environment as follows:

pww-packages bootstrap -o pww3.py

In your other environment, simply run the generated script:

python pww3.py