sklweka package¶
sklweka.classifiers module¶
- class sklweka.classifiers.WekaEstimator(*args: Any, **kwargs: Any)¶
Bases:
sklearn.base.BaseEstimator
,weka.core.classes.OptionHandler
,sklearn.base.RegressorMixin
,sklearn.base.ClassifierMixin
Wraps a Weka classifier (classifier/regressor) within the scikit-learn framework.
- property classifier¶
Returns the underlying classifier object, if any.
- Returns
the classifier object
- Return type
Classifier
- fit(X, y, sample_weight=None)¶
Trains the estimator.
- Parameters
X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)
y (ndarray) – the class attribute column, array-like of shape (n_samples,)
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. TODO Currently ignored.
- Returns
itself
- Return type
- get_params(deep=True)¶
Returns the parameters for this classifier, basically classname and options list.
- Parameters
deep (bool) – ignored
- Returns
the dictionary with options
- Return type
dict
- property header¶
Returns the underlying dataset header, if any.
- Returns
the dataset structure
- Return type
Instances
- predict(X)¶
Performs predictions with the trained classifier.
- Parameters
X (ndarray) – the data matrix to generate predictions for, array-like of shape (n_samples, n_features)
- Returns
the score (or scores)
- Return type
ndarray
- predict_proba(X)¶
Performs predictions and returns class probabilities.
- Parameters
X (ndarray) – the data matrix to generate predictions for, array-like of shape (n_samples, n_features)
- Returns
the probabilities
- score(X, y, sample_weight=None)¶
Classification: return the mean accuracy on the given test data and labels. Regression: return the coefficient of determination of the prediction.
- Parameters
X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)
y (ndarray) – the class attribute column, array-like of shape (n_samples,)
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted.
- Returns
the score
- Return type
float
- set_params(**params)¶
Sets the options for the classifier, expects ‘classname’ and ‘options’.
- Parameters
params (dict) – the parameter dictionary
sklweka.clusters module¶
- class sklweka.clusters.WekaCluster(jobject=None, cluster=None, classname=None, options=None, nominal_input_vars=None, num_nominal_input_labels=None)¶
Bases:
sklearn.base.BaseEstimator
,weka.core.classes.OptionHandler
,sklearn.base.ClusterMixin
Wraps a Weka cluster within the scikit-learn framework.
- property cluster¶
Returns the underlying cluster object, if any.
- Returns
the cluster object
- Return type
Clusterer
- fit(X, y=None, sample_weight=None)¶
Trains the cluster.
- Parameters
X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)
y (ndarray) – ignored
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. TODO Currently ignored.
- Returns
the cluster
- Return type
- fit_predict(X, y=None, sample_weight=None)¶
Trains the cluster and returns the cluster labels.
- Parameters
X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)
y (ndarray) – ignored
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. TODO Currently ignored.
- Returns
the cluster labels (of type int)
- Return type
ndarray
- get_params(deep=True)¶
Returns the parameters for this cluster, basically classname and options list.
- Parameters
deep (bool) – ignored
- Returns
the dictionary with options
- Return type
dict
- property header¶
Returns the underlying dataset header, if any.
- Returns
the dataset structure
- Return type
Instances
- predict(X, y=None)¶
Predicts cluster labels.
- Parameters
X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)
y (ndarray) – ignored
- Returns
the cluster labels (of type int)
- Return type
ndarray
- set_params(**params)¶
Sets the options for the cluster, expects ‘classname’ and ‘options’.
- Parameters
params (dict) – the parameter dictionary
sklweka.dataset module¶
- sklweka.dataset.determine_attribute_type(y)¶
Determines the type of the column.
- Parameters
y (ndarray) – the 1D vector to determine the type for
- Returns
the type (C=categorical, N=numeric)
- Return type
str
- sklweka.dataset.determine_attribute_types(X)¶
Determines the type of the columns.
- Parameters
X (ndarray) – the 2D data to determine the column types for
- Returns
the list of types (C=categorical, N=numeric)
- Return type
list
- sklweka.dataset.load_arff(fname, class_index=None)¶
Loads the specified ARFF file. If a class index is provided, either 0-based int or 1-based string (first,second,last,last-1 are accepted as well), then the data is split into input variables and class attribute.
- Parameters
fname (str) – the path of the ARFF file to load
class_index (int or str) – the class index, either int or str
- Returns
tuple (X, meta), or in case of a valid class index a tuple (X,y,meta)
- Return type
tuple
- sklweka.dataset.load_dataset(fname, loader=None, class_index=None, internal=False)¶
Loads the dataset using Weka’s converters. If no loader instance is provided, the extension of the file is used to determine a loader (using default options). The data can either be returned using mixed types or just numeric (using Weka’s internal representation).
- Parameters
fname (str) – the path of the dataset to load
loader (Loader) – the customized Loader instance to use for loading the dataset, can be None
class_index (str) – the class index string to use (‘first’, ‘second’, ‘third’, ‘last-2’, ‘last-1’, ‘last’ or 1-based index)
internal (bool) – whether to return Weka’s internal format or mixed data types
- Returns
the dataset tuple: (X) if no class index; (X,y) if class index
- sklweka.dataset.parse_range(r, max_value, ordered=True, safe=True)¶
Parses the Weka range string (eg “first-last” or “1,3-5,7,10-last”) of 1-based indices and returns a list of 0-based integers. ‘first’ and ‘last’ are accepted apart from integer strings, ‘-’ is used to define a range (low to high). The list can be returned ordered or as is.
- Parameters
r (str) – the range string to parse
max_value (int) – the maximum value for the 1-based indices
ordered (bool) – whether to return the list ordered or as is
safe (bool) – whether to catch exceptions or not
- Returns
the list of 0-base indices
- Return type
list
- sklweka.dataset.split_off_class(data, class_index)¶
Splits off the class attribute from the data matrix. The class index can either be a 0-based int or a 1-based string (first,second,last,last-1 are accepted as well).
- Parameters
data (ndarray) – the 2D matrix to process
class_index (int or str) – the position of the class attribute to split off
- Returns
the input variables (2D matrix) and the output variable (1D)
- sklweka.dataset.to_array(data)¶
Turns the Instances object into ndarrays for X and y. If no class is present, then y will be None.
- Parameters
data (Instances) – the data to convert
- Returns
the generated arrays for X and y
- Return type
tuple
- sklweka.dataset.to_instance(header, x, y=None, weight=1.0)¶
Generates an Instance from the data.
- Parameters
header (Instances) – the data structure to adhere to
x (ndarray) – the 1D vector with input variables
y (object) – the optional class value
weight (float) – the weight for the Instance
- Returns
the generate Instance
- Return type
Instance
- sklweka.dataset.to_instances(X, y=None, att_names=None, att_types=None, class_name=None, class_type=None, relation_name=None, num_nominal_labels=None, num_class_labels=None)¶
Turns the 2D matrix and the optional 1D class vector into an Instances object.
- Parameters
X (ndarray) – the input variables, 2D matrix
y (ndarray) – the optional class value column, 1D vector
att_names (list) – the list of attribute names
att_types – the list of attribute types (C=categorical, N=numeric), assumes numeric by default if not provided
class_name (str) – the name of the class attribute
class_type (str) – the type of the class attribute (C=categorical, N=numeric)
relation_name (str) – the name for the dataset
num_nominal_labels (dict) – the dictionary with the number of labels (key is 0-based attribute index)
num_class_labels (int) – the number of labels in the class attribute
- Returns
the generated Instances object
- Return type
Instances
- sklweka.dataset.to_nominal_attributes(X, indices)¶
Turns the specified indices numeric column vector into a string vector.
- Parameters
X (ndarray) – the 2D matrix to convert
indices (list or str) – the list of 0-based indices of attributes to convert to nominal or range string with 1-based indices
- Returns
the converted matrix
- Return type
ndarray
- sklweka.dataset.to_nominal_labels(y)¶
Turns the numeric column vector into a string vector.
- Parameters
y (list or ndarray) – the vector to convert
- Returns
the converted vector
- Return type
ndarray
sklweka.preprocessing module¶
- class sklweka.preprocessing.MakeNominal(*args: Any, **kwargs: Any)¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Converts numeric columns to nominal ones (ie string labels).
- fit(X, y)¶
Trains the estimator.
- Parameters
X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)
y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)
- Returns
itself
- Return type
- get_params(deep=True)¶
Returns the parameters for this classifier, basically classname and options list.
- Parameters
deep (bool) – ignored
- Returns
the dictionary with options
- Return type
dict
- property input_vars¶
Returns the 0-based indices or range string with 1-based indices of the input variables to convert.
- Returns
the indices or range string, can be None
- Return type
list or str
- property output_vars¶
Returns whether the output variable gets converted as well.
- Returns
True if the output variable gets converted
- Return type
bool
- set_params(**params)¶
Sets the options for the classifier, expects ‘classname’ and ‘options’.
- Parameters
params (dict) – the parameter dictionary
- transform(X, y=None)¶
Filters the data.
- Parameters
X (ndarray) – the data to filter, array-like of shape (n_samples, n_features)
y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)
- Returns
the filtered data, X if no targets or (X, y) if targets provided
- Return type
ndarray or tuple
- class sklweka.preprocessing.WekaTransformer(*args: Any, **kwargs: Any)¶
Bases:
sklearn.base.BaseEstimator
,weka.core.classes.OptionHandler
,sklearn.base.TransformerMixin
Wraps a Weka filter within the scikit-learn framework.
- property filter¶
Returns the underlying filter object, if any.
- Returns
the classifier object
- Return type
Classifier
- fit(X, y)¶
Trains the estimator.
- Parameters
X (ndarray) – the input variables as matrix, array-like of shape (n_samples, n_features)
y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)
- Returns
itself
- Return type
- get_params(deep=True)¶
Returns the parameters for this classifier, basically classname and options list.
- Parameters
deep (bool) – ignored
- Returns
the dictionary with options
- Return type
dict
- property header¶
Returns the underlying dataset header, if any.
- Returns
the dataset structure
- Return type
Instances
- set_params(**params)¶
Sets the options for the classifier, expects ‘classname’ and ‘options’.
- Parameters
params (dict) – the parameter dictionary
- transform(X, y=None)¶
Filters the data.
- Parameters
X (ndarray) – the data to filter, array-like of shape (n_samples, n_features)
y (ndarray) – the optional class attribute column, array-like of shape (n_samples,)
- Returns
the filtered data, X if no targets or (X, y) if targets provided
- Return type
ndarray or tuple