Docker
======
In order to make installation reproducible, `Docker `__
images for various python-weka-wrapper3 versions are available from Docker Hub:
`fracpete/pww3/tags `__
If you are unfamiliar with Docker, then have a look at the following
introduction for Data Scientists:
`data-mining.co.nz/docker-for-data-scientists/ `__
Images
------
CPU
+++
For using the CPU image interactively, you can run the following command:
.. code-block:: bash
docker run -u $(id -u):$(id -g) \
-it fracpete/pww3:0.2.14_cpu
Instead of having to reinstall your packages each time you start up the container,
you can map your local Weka packages into the container as follows:
.. code-block:: bash
docker run -u $(id -u):$(id -g) \
-v $HOME/wekafiles/:/workspace/wekafiles \
-it fracpete/pww3:0.2.14_cpu
GPU
+++
For using the GPU image interactively, you can run the following command:
.. code-block:: bash
docker run --gpus=all -u $(id -u):$(id -g) \
-it fracpete/pww3:0.2.14_cuda10.2
Instead of having to reinstall your packages each time you start up the container,
you can map your local Weka packages into the container as follows:
.. code-block:: bash
docker run --gpus=all -u $(id -u):$(id -g) \
-v $HOME/wekafiles/:/workspace/wekafiles \
-it fracpete/pww3:0.2.14_cuda10.2
Usage
-----
Executables
+++++++++++
Since python-weka-wrapper3 installs executables for the main class hierarchies
of Weka, you can run these directly with a single command.
E.g., if we want to train a J48 classifier on the dataset `anneal.arff` which
is in our current directory (`pwd`), then we can do something like this:
.. code-block:: bash
docker run --rm -u $(id -u):$(id -g) \
-v $HOME/wekafiles:/workspace/wekafiles \
-v `pwd`:/workspace/data \
-t fracpete/pww3:0.2.14_cpu \
pww-classifier \
-t /workspace/data/anneal.arff \
weka.classifiers.trees.J48
Scripts
+++++++
For more flexibility and/or more complex operations, you would normally want to
fall back on using Python scripts. The above example can be translated into
the following Python script (saved in current directory as `j48.py`):
.. code-block:: python
import weka.core.jvm as jvm
from weka.core.classes import Random
from weka.core.converters import load_any_file
from weka.classifiers import Classifier, Evaluation
jvm.start()
data = load_any_file("/workspace/data/anneal.arff", class_index="last")
cls = Classifier(classname="weka.classifiers.trees.J48")
evl = Evaluation(data)
evl.crossvalidate_model(cls, data, 10, Random(1))
print(evl.summary())
jvm.stop()
This script is then executed as follows:
.. code-block:: bash
docker run --rm -u $(id -u):$(id -g) \
-v $HOME/wekafiles:/workspace/wekafiles \
-v `pwd`:/workspace/data \
-t fracpete/pww3:0.2.14_cpu \
python3 /workspace/data/j48.py
Installing Weka packages
------------------------
When building Docker images for your environments, your code will most likely rely
on additional Weka packages. You can install the packages by creating a little
Python script that uses python-weka-wrapper3 to install them (just like you would
normally do in a script). Here is the content of the ``install_packages.py``
script:
.. code-block:: python
import weka.core.jvm as jvm
import weka.core.packages as packages
jvm.start(packages=True)
# for reproducibility, we also specify the version
packages.install_package("SelfOrganizingMap", version="1.0.3")
jvm.stop()
A minimal ``Dockerfile`` (in the same directory as ``install_packages.py``) then looks
like this (using pww3 0.2.14 for CPU):
::
FROM fracpete/pww3:0.2.14_cpu
COPY install_packages.py /workspace/install_packages.py
RUN python3 /workspace/install_packages.py
You can then build this image just like any other Docker image:
.. code-block:: bash
docker build -t pww3-pkg .
For testing, you can create a local script called ``test_packages.py`` with
the content similar to this:
.. code-block:: python
import weka.core.jvm as jvm
import weka.core.packages as packages
from weka.clusterers import Clusterer
jvm.start(packages=True)
# list packages
items = packages.installed_packages()
for item in items:
print(item.name + "/" + item.version + "\n " + item.url)
# instantiate from package
cls = Clusterer(classname="weka.clusterers.SelfOrganizingMap")
print(cls.to_commandline())
jvm.stop()
The following command simply runs our ``test_packages.py`` script. To achieve this,
the command maps the current directory (``pwd``) into the container's ``/workspace/scripts``
directory:
.. code-block:: bash
docker run \
-v `pwd`:/workspace/scripts \
-t pww3-pkg:latest \
python3 /workspace/scripts/test_packages.py
The output will be something like this:
::
DEBUG:weka.core.jvm:Adding bundled jars
DEBUG:weka.core.jvm:Classpath=['/usr/local/lib/python3.8/dist-packages/javabridge/jars/rhino-1.7R4.jar', '/usr/local/lib/python3.8/dist-packages/javabridge/jars/runnablequeue.jar', '/usr/local/lib/python3.8/dist-packages/javabridge/jars/cpython.jar', '/usr/local/lib/python3.8/dist-packages/weka/lib/weka.jar', '/usr/local/lib/python3.8/dist-packages/weka/lib/python-weka-wrapper.jar']
DEBUG:weka.core.jvm:MaxHeapSize=default
DEBUG:weka.core.jvm:Package support enabled
SelfOrganizingMap/1.0.3
http://prdownloads.sourceforge.net/wekann/SelfOrganizingMap1.0.3.zip?download
weka.clusterers.SelfOrganizingMap -L 1.0 -O 2000 -C 1000 -H 2 -W 2