weka.plot package

Submodules

weka.plot.classifiers module

weka.plot.classifiers.generate_thresholdcurve_data(evaluation, class_index)

Generates the threshold curve data from the evaluation object’s predictions.

Parameters:
  • evaluation (Evaluation) – the evaluation to obtain the predictions from
  • class_index (int) – the 0-based index of the class-label to create the plot for
Returns:

the generated threshold curve data

Return type:

Instances

weka.plot.classifiers.get_auc(data)

Calculates the area under the ROC curve (AUC).

Parameters:data (Instances) – the threshold curve data
Returns:the area
Return type:float
weka.plot.classifiers.get_prc(data)

Calculates the area under the precision recall curve (PRC).

Parameters:data (Instances) – the threshold curve data
Returns:the area
Return type:float
weka.plot.classifiers.get_thresholdcurve_data(data, xname, yname)

Retrieves x and y columns from of the data generated by the weka.classifiers.evaluation.ThresholdCurve class.

Parameters:
  • data (Instances) – the threshold curve data
  • xname (str) – the name of the X column
  • yname (str) – the name of the Y column
Returns:

tuple of x and y arrays

Return type:

tuple

weka.plot.classifiers.plot_classifier_errors(predictions, absolute=True, max_relative_size=50, absolute_size=50, title=None, outfile=None, wait=True, key_loc='lower center')

Plots the classifers for the given list of predictions.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • predictions (list or dict) – the predictions to plot, use a dict to plot predictions of multiple classifiers (keys are used as prefixes for plots)
  • absolute (bool) – whether to use absolute errors as size or relative ones
  • max_relative_size (int) – the maximum size in point in case of relative mode
  • absolute_size (int) – the size in point in case of absolute mode
  • title (str) – an optional title
  • outfile (str) – the output file, ignored if None
  • wait (bool) – whether to wait for the user to close the plot
  • key_loc (str) – the location string for the key
weka.plot.classifiers.plot_learning_curve(classifiers, train, test=None, increments=100, metric='percent_correct', title='Learning curve', label_template='[#] @ $', key_loc='lower right', outfile=None, wait=True)

Plots a learning curve.

Parameters:
  • classifiers (list of Classifier) – list of Classifier template objects
  • train (Instances) – dataset to use for the building the classifier, used for evaluating it test set None
  • test (list or Instances) – optional dataset (or list of datasets) to use for the testing the built classifiers
  • increments (float) – the increments (>= 1: # of instances, <1: percentage of dataset)
  • metric (str) – the name of the numeric metric to plot (Evaluation.<metric>)
  • title (str) – the title for the plot
  • label_template (str) – the template for the label in the plot (#: 1-based index of classifier, @: full classname, !: simple classname, $: options, *: 1-based index of test set)
  • key_loc (str) – the location string for the key
  • outfile (str) – the output file, ignored if None
  • wait (bool) – whether to wait for the user to close the plot
weka.plot.classifiers.plot_prc(evaluation, class_index=None, title=None, key_loc='lower center', outfile=None, wait=True)

Plots the PRC (precision recall) curve for the given predictions.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • evaluation (Evaluation) – the evaluation to obtain the predictions from
  • class_index (list) – the list of 0-based indices of the class-labels to create the plot for
  • title (str) – an optional title
  • key_loc (str) – the location string for the key
  • outfile (str) – the output file, ignored if None
  • wait (bool) – whether to wait for the user to close the plot
weka.plot.classifiers.plot_prcs(evaluations, class_index=0, title=None, key_loc='lower center', outfile=None, wait=True)

Plots the PRC (precision recall) curve for the predictions of multiple classifiers on the same dataset.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • evaluations (dict) – the dictionary of Evaluation objects to obtain the predictions from, the key is used in the plot key as prefix
  • class_index (int) – the 0-based index of the class-label to create the plot for
  • title (str) – an optional title
  • key_loc (str) – the location string for the key
  • outfile (str) – the output file, ignored if None
  • wait (bool) – whether to wait for the user to close the plot
weka.plot.classifiers.plot_roc(evaluation, class_index=None, title=None, key_loc='lower right', outfile=None, wait=True)

Plots the ROC (receiver operator characteristics) curve for the given predictions.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • evaluation (Evaluation) – the evaluation to obtain the predictions from
  • class_index (list) – the list of 0-based indices of the class-labels to create the plot for
  • title (str) – an optional title
  • key_loc (str) – the position string for the key
  • outfile (str) – the output file, ignored if None
  • wait (bool) – whether to wait for the user to close the plot
weka.plot.classifiers.plot_rocs(evaluations, class_index=0, title=None, key_loc='lower right', outfile=None, wait=True)

Plots the ROC (receiver operator characteristics) curve for the predictions of multiple classifiers on the same dataset.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • evaluations (dict) – the dictionary of Evaluation objects to obtain the predictions from, the key is used in the plot key as prefix
  • class_index (int) – the 0-based index of the class-label to create the plot for
  • title (str) – an optional title
  • key_loc (str) – the position string for the key
  • outfile (str) – the output file, ignored if None
  • wait (bool) – whether to wait for the user to close the plot

weka.plot.clusterers module

weka.plot.clusterers.plot_cluster_assignments(evl, data, atts=None, inst_no=False, size=10, title=None, outfile=None, wait=True)

Plots the cluster assignments against the specified attributes.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • evl (ClusterEvaluation) – the cluster evaluation to obtain the cluster assignments from
  • data (Instances) – the dataset the clusterer was evaluated against
  • atts (list) – the list of attribute indices to plot, None for all
  • inst_no (bool) – whether to include a fake attribute with the instance number
  • size (int) – the size of the circles in point
  • title (str) – an optional title
  • outfile (str) – the (optional) file to save the generated plot to. The extension determines the file format.
  • wait (bool) – whether to wait for the user to close the plot

weka.plot.dataset module

weka.plot.dataset.line_plot(data, atts=None, percent=100.0, seed=1, title=None, outfile=None, wait=True)

Uses the internal format to plot the dataset, one line per instance.

Parameters:
  • data (Instances) – the dataset
  • atts (list) – the list of 0-based attribute indices of attributes to plot
  • percent (float) – the percentage of the dataset to use for plotting
  • seed (int) – the seed value to use for subsampling
  • title (str) – an optional title
  • outfile (str) – the (optional) file to save the generated plot to. The extension determines the file format.
  • wait (bool) – whether to wait for the user to close the plot
weka.plot.dataset.matrix_plot(data, percent=100.0, seed=1, size=10, title=None, outfile=None, wait=True)

Plots all attributes against each other.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • data (Instances) – the dataset
  • percent (float) – the percentage of the dataset to use for plotting
  • seed (int) – the seed value to use for subsampling
  • size (int) – the size of the circles in point
  • title (str) – an optional title
  • outfile (str) – the (optional) file to save the generated plot to. The extension determines the file format.
  • wait (bool) – whether to wait for the user to close the plot
weka.plot.dataset.scatter_plot(data, index_x, index_y, percent=100.0, seed=1, size=50, title=None, outfile=None, wait=True)

Plots two attributes against each other.

TODO: click events http://matplotlib.org/examples/event_handling/data_browser.html

Parameters:
  • data (Instances) – the dataset
  • index_x (int) – the 0-based index of the attribute on the x axis
  • index_y (int) – the 0-based index of the attribute on the y axis
  • percent (float) – the percentage of the dataset to use for plotting
  • seed (int) – the seed value to use for subsampling
  • size (int) – the size of the circles in point
  • title (str) – an optional title
  • outfile (str) – the (optional) file to save the generated plot to. The extension determines the file format.
  • wait (bool) – whether to wait for the user to close the plot

weka.plot.experiments module

weka.plot.experiments.plot_experiment(mat, title='Experiment', axes_swapped=False, measure='Statistic', show_stdev=False, key_loc='lower right', outfile=None, wait=True)

Plots the results from an experiment.

Parameters:
  • mat (ResultMatrix) – the result matrix to plot
  • title (str) – the title for the experiment
  • axes_swapped (bool) – whether the axes whether swapped (“sets x cls” or “cls x sets”)
  • measure (str) – the measure that is being displayed
  • show_stdev (bool) – whether to show the standard deviation as error bar
  • key_loc (str) – the location string for the key
  • outfile (str) – the output file, ignored if None
  • wait (bool) – whether to wait for the user to close the plot

weka.plot.graph module

weka.plot.graph.plot_dot_graph(graph, filename=None)

Plots a graph in graphviz dot notation.

Parameters:
  • graph (str) – the dot notation graph
  • filename (str) – the (optional) file to save the generated plot to. The extension determines the file format.

Module contents

weka.plot.create_subsample(data, percent, seed=1)

Generates a subsample of the dataset. :param data: the data to create the subsample from :type data: Instances :param percent: the percentage (0-100) :type percent: float :param seed: the seed value to use :type seed: int