Skip to main content
Version: 25.4

snorkelflow.client.analyses.plot_scatterplot

snorkelflow.client.analyses.plot_scatterplot(node, df, x_axis_field, y_axis_field, label_field, drop_unknown=True, display_legend=True, scatter_kwargs=None, axes_kwargs=None, filter_flag_column=None, color_map='rainbow')

Plot the scatterplot of datapoints across two fields.

To find promising LFs involving a combination of numeric features, the plot_scatterplot() function can be used instead to visualize datapoints. Running %matplotlib notebook in your notebook prior to running this function will open up the plot in interactive mode, allowing for easier exploration of the data.

If you find a view while exploring a scatterplot in interactive mode that would make a promising bounding rectangle LF, then you can use the add_scatterpot_lf function to do so easily. Using the function in the next cell will automatically generate a LF from your current view of the scatterplot. For example, we can examine network data and easily creating a bounding LF for Youtube traffic as follows:

Examples

%matplotlib notebook
df = sf.get_dataset(node, split='dev')
axis_obj = sf.plot_scatterplot(node, df, 'Fwd.Packet.Length.Max', 'Avg.Fwd.Segment.Size', "ProtocolName")

# Run this cell after the view in the interactive scatterplot contains mostly YOUTUBE labels
sf.add_scatterplot_lf(node, axis_obj, 'YOUTUBE')

Parameters

NameTypeDefaultInfo
nodeintUID of the node.
dfDataFrameThe data frame with a numeric field over which to plot the scatterplot.
x_axis_fieldstrName of the field in the df that contains numeric values to plot on the x axis.
y_axis_fieldstrName of the field in the df that contains numeric values to plot on the y axis.
label_fieldstrName of the field containing labels for plotting on a per-class basis.
drop_unknownboolTrueIf True, remove examples with unknown GT when plotting on a per-class basis.
display_legendboolTrueIf True display a legend of the plotted classes.
scatter_kwargsOptional[Dict[str, Any]]NoneArguments to pass through to matplotlib.ax.scatter() (https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html).
axes_kwargsOptional[Dict[str, Any]]None

Arguments to pass through to matplotlib.pyplot.gca().set() (https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.gca)

Commonly used axes_kwargs:

title: Add a title to the plot

xlabel: Edit x-axis label (default is x_axis_field value)

ylabel: Edit y-axis label (default is y_axis_field value)

filter_flag_columnOptional[str]NoneName of column describing whether datapoints have been filtered out are not (can be accessed via show_filtered_flag=True in task.dataset()).
color_mapstr'rainbow'Name of the matplotlib color map to use when coloring classes (https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html).

Raises

  • ValueError – If x_axis_field is not one of the columns in the given DataFrame

  • ValueError – If y_axis_field is not one of the columns in the given DataFrame

  • ValueError – If label_field is not one of the coluns in the given DataFrame

  • ValueError – If filter_flag_column is not one of the coluns in the given DataFrame

Returns

An axis object containing information about the plot that can be passed into add_scatterplot_lf to quickly generate a bounding rect LF

Return type

matplotlib.pyplot.axis