sklweka package

sklweka.classifiers module

class sklweka.classifiers.WekaEstimator(*args: Any, **kwargs: Any)

Bases: sklearn.base.BaseEstimator, weka.core.classes.OptionHandler, sklearn.base.RegressorMixin, sklearn.base.ClassifierMixin

Wraps a Weka classifier (classifier/regressor) within the scikit-learn framework.

property classifier

Returns the underlying classifier object, if any.

Returns

the classifier object

Return type

Classifier

fit(X, y, sample_weight=None)

Trains the estimator.

Parameters
  • X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)

  • y (ndarray) – the class attribute column, array-like of shape (n_samples,)

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. TODO Currently ignored.

Returns

itself

Return type

WekaEstimator

get_params(deep=True)

Returns the parameters for this classifier, basically classname and options list.

Parameters

deep (bool) – ignored

Returns

the dictionary with options

Return type

dict

property header

Returns the underlying dataset header, if any.

Returns

the dataset structure

Return type

Instances

predict(X)

Performs predictions with the trained classifier.

Parameters

X (ndarray) – the data matrix to generate predictions for, array-like of shape (n_samples, n_features)

Returns

the score (or scores)

Return type

ndarray

predict_proba(X)

Performs predictions and returns class probabilities.

Parameters

X (ndarray) – the data matrix to generate predictions for, array-like of shape (n_samples, n_features)

Returns

the probabilities

score(X, y, sample_weight=None)

Classification: return the mean accuracy on the given test data and labels. Regression: return the coefficient of determination of the prediction.

Parameters
  • X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)

  • y (ndarray) – the class attribute column, array-like of shape (n_samples,)

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted.

Returns

the score

Return type

float

set_params(**params)

Sets the options for the classifier, expects ‘classname’ and ‘options’.

Parameters

params (dict) – the parameter dictionary

sklweka.clusters module

class sklweka.clusters.WekaCluster(jobject=None, cluster=None, classname=None, options=None, nominal_input_vars=None, num_nominal_input_labels=None)

Bases: sklearn.base.BaseEstimator, weka.core.classes.OptionHandler, sklearn.base.ClusterMixin

Wraps a Weka cluster within the scikit-learn framework.

property cluster

Returns the underlying cluster object, if any.

Returns

the cluster object

Return type

Clusterer

fit(X, y=None, sample_weight=None)

Trains the cluster.

Parameters
  • X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)

  • y (ndarray) – ignored

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. TODO Currently ignored.

Returns

the cluster

Return type

WekaCluster

fit_predict(X, y=None, sample_weight=None)

Trains the cluster and returns the cluster labels.

Parameters
  • X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)

  • y (ndarray) – ignored

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. TODO Currently ignored.

Returns

the cluster labels (of type int)

Return type

ndarray

get_params(deep=True)

Returns the parameters for this cluster, basically classname and options list.

Parameters

deep (bool) – ignored

Returns

the dictionary with options

Return type

dict

property header

Returns the underlying dataset header, if any.

Returns

the dataset structure

Return type

Instances

predict(X, y=None)

Predicts cluster labels.

Parameters
  • X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)

  • y (ndarray) – ignored

Returns

the cluster labels (of type int)

Return type

ndarray

set_params(**params)

Sets the options for the cluster, expects ‘classname’ and ‘options’.

Parameters

params (dict) – the parameter dictionary

sklweka.dataset module

sklweka.dataset.determine_attribute_type(y)

Determines the type of the column.

Parameters

y (ndarray) – the 1D vector to determine the type for

Returns

the type (C=categorical, N=numeric)

Return type

str

sklweka.dataset.determine_attribute_types(X)

Determines the type of the columns.

Parameters

X (ndarray) – the 2D data to determine the column types for

Returns

the list of types (C=categorical, N=numeric)

Return type

list

sklweka.dataset.load_arff(fname, class_index=None)

Loads the specified ARFF file. If a class index is provided, either 0-based int or 1-based string (first,second,last,last-1 are accepted as well), then the data is split into input variables and class attribute.

Parameters
  • fname (str) – the path of the ARFF file to load

  • class_index (int or str) – the class index, either int or str

Returns

tuple (X, meta), or in case of a valid class index a tuple (X,y,meta)

Return type

tuple

sklweka.dataset.load_dataset(fname, loader=None, class_index=None, internal=False)

Loads the dataset using Weka’s converters. If no loader instance is provided, the extension of the file is used to determine a loader (using default options). The data can either be returned using mixed types or just numeric (using Weka’s internal representation).

Parameters
  • fname (str) – the path of the dataset to load

  • loader (Loader) – the customized Loader instance to use for loading the dataset, can be None

  • class_index (str) – the class index string to use (‘first’, ‘second’, ‘third’, ‘last-2’, ‘last-1’, ‘last’ or 1-based index)

  • internal (bool) – whether to return Weka’s internal format or mixed data types

Returns

the dataset tuple: (X) if no class index; (X,y) if class index

sklweka.dataset.parse_range(r, max_value, ordered=True, safe=True)

Parses the Weka range string (eg “first-last” or “1,3-5,7,10-last”) of 1-based indices and returns a list of 0-based integers. ‘first’ and ‘last’ are accepted apart from integer strings, ‘-’ is used to define a range (low to high). The list can be returned ordered or as is.

Parameters
  • r (str) – the range string to parse

  • max_value (int) – the maximum value for the 1-based indices

  • ordered (bool) – whether to return the list ordered or as is

  • safe (bool) – whether to catch exceptions or not

Returns

the list of 0-base indices

Return type

list

sklweka.dataset.split_off_class(data, class_index)

Splits off the class attribute from the data matrix. The class index can either be a 0-based int or a 1-based string (first,second,last,last-1 are accepted as well).

Parameters
  • data (ndarray) – the 2D matrix to process

  • class_index (int or str) – the position of the class attribute to split off

Returns

the input variables (2D matrix) and the output variable (1D)

sklweka.dataset.to_array(data)

Turns the Instances object into ndarrays for X and y. If no class is present, then y will be None.

Parameters

data (Instances) – the data to convert

Returns

the generated arrays for X and y

Return type

tuple

sklweka.dataset.to_instance(header, x, y=None, weight=1.0)

Generates an Instance from the data.

Parameters
  • header (Instances) – the data structure to adhere to

  • x (ndarray) – the 1D vector with input variables

  • y (object) – the optional class value

  • weight (float) – the weight for the Instance

Returns

the generate Instance

Return type

Instance

sklweka.dataset.to_instances(X, y=None, att_names=None, att_types=None, class_name=None, class_type=None, relation_name=None, num_nominal_labels=None, num_class_labels=None)

Turns the 2D matrix and the optional 1D class vector into an Instances object.

Parameters
  • X (ndarray) – the input variables, 2D matrix

  • y (ndarray) – the optional class value column, 1D vector

  • att_names (list) – the list of attribute names

  • att_types – the list of attribute types (C=categorical, N=numeric), assumes numeric by default if not provided

  • class_name (str) – the name of the class attribute

  • class_type (str) – the type of the class attribute (C=categorical, N=numeric)

  • relation_name (str) – the name for the dataset

  • num_nominal_labels (dict) – the dictionary with the number of labels (key is 0-based attribute index)

  • num_class_labels (int) – the number of labels in the class attribute

Returns

the generated Instances object

Return type

Instances

sklweka.dataset.to_nominal_attributes(X, indices)

Turns the specified indices numeric column vector into a string vector.

Parameters
  • X (ndarray) – the 2D matrix to convert

  • indices (list or str) – the list of 0-based indices of attributes to convert to nominal or range string with 1-based indices

Returns

the converted matrix

Return type

ndarray

sklweka.dataset.to_nominal_labels(y)

Turns the numeric column vector into a string vector.

Parameters

y (list or ndarray) – the vector to convert

Returns

the converted vector

Return type

ndarray

sklweka.preprocessing module

class sklweka.preprocessing.MakeNominal(*args: Any, **kwargs: Any)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Converts numeric columns to nominal ones (ie string labels).

fit(X, y)

Trains the estimator.

Parameters
  • X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)

  • y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)

Returns

itself

Return type

WekaTransformer

get_params(deep=True)

Returns the parameters for this classifier, basically classname and options list.

Parameters

deep (bool) – ignored

Returns

the dictionary with options

Return type

dict

property input_vars

Returns the 0-based indices or range string with 1-based indices of the input variables to convert.

Returns

the indices or range string, can be None

Return type

list or str

property output_vars

Returns whether the output variable gets converted as well.

Returns

True if the output variable gets converted

Return type

bool

set_params(**params)

Sets the options for the classifier, expects ‘classname’ and ‘options’.

Parameters

params (dict) – the parameter dictionary

transform(X, y=None)

Filters the data.

Parameters
  • X (ndarray) – the data to filter, array-like of shape (n_samples, n_features)

  • y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)

Returns

the filtered data, X if no targets or (X, y) if targets provided

Return type

ndarray or tuple

class sklweka.preprocessing.WekaTransformer(*args: Any, **kwargs: Any)

Bases: sklearn.base.BaseEstimator, weka.core.classes.OptionHandler, sklearn.base.TransformerMixin

Wraps a Weka filter within the scikit-learn framework.

property filter

Returns the underlying filter object, if any.

Returns

the classifier object

Return type

Classifier

fit(X, y)

Trains the estimator.

Parameters
  • X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)

  • y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)

Returns

itself

Return type

WekaTransformer

get_params(deep=True)

Returns the parameters for this classifier, basically classname and options list.

Parameters

deep (bool) – ignored

Returns

the dictionary with options

Return type

dict

property header

Returns the underlying dataset header, if any.

Returns

the dataset structure

Return type

Instances

set_params(**params)

Sets the options for the classifier, expects ‘classname’ and ‘options’.

Parameters

params (dict) – the parameter dictionary

transform(X, y=None)

Filters the data.

Parameters
  • X (ndarray) – the data to filter, array-like of shape (n_samples, n_features)

  • y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)

Returns

the filtered data, X if no targets or (X, y) if targets provided

Return type

ndarray or tuple