Command-line

From command-line, python-weka-wrapper behaves similar to Weka itself, i.e., the command-line. Most of the general options are available, as well as the following:

  • -j for adding additional jars, in the same format as the classpath for the platform. E.g., for Linux, -j /some/where/a.jar:/other/place/b.jar

  • -X for defining the maximum heap size. E.g., -X 512m for 512 MB of heap size.

The following examples are all for a Linux bash environment. Windows users have to replace forwarding slashes / with backslashes \ and place the command on a single line with the backslashes \ at the end of the lines removed.

Data generators

Artifical data can be generated using one of Weka’s data generators, e.g., the Agrawal classification generator:

pww-datagenerator \
    -o /tmp/out.arff \
    weka.datagenerators.classifiers.classification.Agrawal

Command-line help screen:

usage: pww-datagenerator [-h] [-j classpath] [-X heap] datagenerator ...

Executes a data generator from the command-line. Calls JVM start/stop
automatically.

positional arguments:
  datagenerator  data generator classname, e.g.,
                 weka.datagenerators.classifiers.classification.LED24
  option         additional data generator options

optional arguments:
  -h, --help     show this help message and exit
  -j classpath   additional classpath, jars/directories
  -X heap        max heap size for jvm, e.g., 512m

Filters

Filtering a single ARFF dataset, removing the last attribute using the Remove filter:

pww-filter \
    -i /my/datasets/iris.arff \
    -o /tmp/out.arff \
    -c last \
    weka.filters.unsupervised.attribute.Remove \
    -R last

For batch filtering, you can use the -r and -s options for the input and output for the second file.

Command-line help screen:

usage: pww-filter [-h] [-j classpath] [-X heap] -i input1 -o output1
                  [-r input2] [-s output2] [-c classindex]
                  filter ...

Executes a filter from the command-line. Calls JVM start/stop automatically.

positional arguments:
  filter         filter classname, e.g., weka.filters.AllFilter
  option         additional filter options

optional arguments:
  -h, --help     show this help message and exit
  -j classpath   additional classpath, jars/directories
  -X heap        max heap size for jvm, e.g., 512m
  -i input1      input file 1
  -o output1     output file 1
  -r input2      input file 2
  -s output2     output file 2
  -c classindex  1-based class attribute index

Classifiers

Example on how to cross-validate a J48 classifier (with confidence factor 0.3) on the iris UCI dataset:

pww-classifier \
     -t /my/datasets/iris.arff \
     -c last \
     weka.classifiers.trees.J48
     -C 0.3

Command-line help screen:

usage: pww-classifier [-h] [-j classpath] [-X heap] -t train [-T test]
                      [-c class index] [-d outmodel] [-l inmodel]
                      [-x num folds] [-s seed] [-v] [-o] [-i] [-k]
                      [-m costmatrix] [-g graph]
                      classifier ...

Performs classification/regression from the command-line. Calls JVM start/stop
automatically.

positional arguments:
  classifier      classifier classname, e.g., weka.classifiers.trees.J48
  option          additional classifier options

optional arguments:
  -h, --help      show this help message and exit
  -j classpath    additional classpath, jars/directories
  -X heap         max heap size for jvm, e.g., 512m
  -t train        Training set file
  -T test         Test set file
  -c class index  1-based class attribute index
  -d outmodel     model output file name
  -l inmodel      model input file name
  -x num folds    number of folds for cross-validation
  -s seed         seed value for randomization
  -v              no statistics for training
  -o              only statistics, don't output model
  -i              output information retrieval statistics
  -k              output information theoretic statistics
  -m costmatrix   cost matrix file
  -g graph        output file for graph (if supported)

Clusterers

Example on how to perform classes-to-clusters evaluation for SimpleKMeans (with 3 clusters) using the iris UCI dataset:

pww-clusterer \
    -t /my/datasets/iris.arff \
    -c last \
    weka.clusterers.SimpleKMeans
    -N 3

Command-line help screen:

usage: pww-clusterer [-h] [-j classpath] [-X heap] -t train [-T test]
                     [-d outmodel] [-l inmodel] [-p attributes] [-x num folds]
                     [-s seed] [-c class index] [-g graph]
                     clusterer ...

Performs clustering from the command-line. Calls JVM start/stop automatically.

positional arguments:
  clusterer       clusterer classname, e.g., weka.clusterers.SimpleKMeans
  option          additional clusterer options

optional arguments:
  -h, --help      show this help message and exit
  -j classpath    additional classpath, jars/directories
  -X heap         max heap size for jvm, e.g., 512m
  -t train        training set file
  -T test         test set file
  -d outmodel     model output file name
  -l inmodel      model input file name
  -p attributes   attribute range
  -x num folds    number of folds
  -s seed         seed value for randomization
  -c class index  1-based class attribute index
  -g graph        graph output file (if supported)

Attribute selection

You can perform attribute selection using BestFirst as search algorithm and CfsSubsetEval as evaluator as follows:

pww-attsel \
    -i /my/datasets/iris.arff \
    -x 5 \
    -n 42 \
    -s "weka.attributeSelection.BestFirst -D 1 -N 5"
    weka.attributeSelection.CfsSubsetEval \
    -P 1 \
    -E 1

Command-line help screen:

usage: pww-attsel [-h] [-j classpath] [-X heap] -i input [-c class index]
                  [-s search] [-x num folds] [-n seed]
                  evaluator ...

Performs attribute selection from the command-line. Calls JVM start/stop
automatically.

positional arguments:
  evaluator       evaluator classname, e.g.,
                  weka.attributeSelection.CfsSubsetEval
  option          additional evaluator options

optional arguments:
  -h, --help      show this help message and exit
  -j classpath    additional classpath, jars/directories
  -X heap         max heap size for jvm, e.g., 512m
  -i input        input file
  -c class index  1-based class attribute index
  -s search       search method, classname and options
  -x num folds    number of folds
  -n seed         the seed value for randomization

Associators

Associators, like Apriori, can be run like this:

pww-associator \
    -t /my/datasets/iris.arff \
    weka.associations.Apriori \
    -N 9 -I

Command-line help screen:

usage: pww-associator [-h] [-j classpath] [-X heap] -t train associator ...

Executes an associator from the command-line. Calls JVM start/stop
automatically.

positional arguments:
  associator    associator classname, e.g., weka.associations.Apriori
  option        additional associator options

optional arguments:
  -h, --help    show this help message and exit
  -j classpath  additional classpath, jars/directories
  -X heap       max heap size for jvm, e.g., 512m
  -t train      training set file

Package management

Versions newer than 0.2.9 also offer package management from the command-line via the pww-packages command. There are several sub-commands available:

usage: pww-packages [-h]
                   {list,info,install,uninstall,remove,suggest,is-installed}
                   ...

Manages Weka packages.

positional arguments:
  {list,info,install,uninstall,remove,suggest,is-installed}
    list                For listing all/installed/available packages
    info                Outputs information about packages
    install             For installing one or more packages
    uninstall (remove)  For uninstalling one or more packages
    suggest             For suggesting packages that contain the specified
                        class
    is-installed        Checks whether a package is installed, simply outputs
                        true/false

optional arguments:
  -h, --help            show this help message and exit

Listing packages

Listing all, available or installed packages can be done using the list sub-command:

usage: pww-packages list [-h] [-f {text,json}] [-o FILE] [-r]
                         [{all,installed,available}]

positional arguments:
  {all,installed,available}
                        defines what packages to list

optional arguments:
  -h, --help            show this help message and exit
  -f {text,json}, --format {text,json}
                        the output format to use
  -o FILE, --output FILE
                        the file to store the output in, uses stdout if not
                        supplied
  -r, --refresh-cache   whether to refresh the package cache

Info on packages

Outputting information on one or more packages is achieved with the list sub-command:

usage: pww-packages info [-h] [-t {brief,full}] [-f {text,json}] [-o FILE]
                         [-r]
                         name [name ...]

positional arguments:
  name                  the package(s) to output the information for

optional arguments:
  -h, --help            show this help message and exit
  -t {brief,full}, --type {brief,full}
                        the type of information to output
  -f {text,json}, --format {text,json}
                        the output format to use
  -o FILE, --output FILE
                        the file to store the output in, uses stdout if not
                        supplied
  -r, --refresh-cache   whether to refresh the package cache

Installing/uninstalling/check installed status

The install sub-command installs one or more packages:

usage: pww-packages install [-h] packages [packages ...]

positional arguments:
  packages    the name of the package(s) to install, append '==VERSION' to pin
              to a specific version

optional arguments:
  -h, --help  show this help message and exit

The uninstall (or remove) sub-command removes one or more packages:

usage: pww-packages uninstall [-h] packages [packages ...]

positional arguments:
  packages    the name of the package(s) to uninstall

optional arguments:
  -h, --help  show this help message and exit

The is-installed sub-command outputs whether a package is installed or not:

usage: pww-packages is-installed [-h] [-f {text,json}] [-o FILE]
                                 name [name ...]

positional arguments:
  name                  the name of the package to check, append '==VERSION'
                        to pin to a specific version

optional arguments:
  -h, --help            show this help message and exit
  -f {text,json}, --format {text,json}
                        the output format to use
  -o FILE, --output FILE
                        the file to store the output in, uses stdout if not
                        supplied

Suggest packages

If you are not sure which package a certain class is part of, then use the suggest sub-command to help with that (this works only for official packages):

usage: pww-packages suggest [-h] [-e] [-f {text,json}] [-o FILE] classname

positional arguments:
  classname             the classname to suggest packages for

optional arguments:
  -h, --help            show this help message and exit
  -e, --exact           whether to match the name exactly or perform substring
                        matching
  -f {text,json}, --format {text,json}
                        the output format to use
  -o FILE, --output FILE
                        the file to store the output in, uses stdout if not
                        supplied