snorkelflow.client.analyses.plot_scatterplot
- snorkelflow.client.analyses.plot_scatterplot(node, df, x_axis_field, y_axis_field, label_field, drop_unknown=True, display_legend=True, scatter_kwargs=None, axes_kwargs=None, filter_flag_column=None, color_map='rainbow')
Plot the scatterplot of datapoints across two fields.
To find promising LFs involving a combination of numeric features, the
plot_scatterplot()
function can be used instead to visualize datapoints. Running%matplotlib notebook
in your notebook prior to running this function will open up the plot in interactive mode, allowing for easier exploration of the data.If you find a view while exploring a scatterplot in interactive mode that would make a promising bounding rectangle LF, then you can use the
add_scatterpot_lf
function to do so easily. Using the function in the next cell will automatically generate a LF from your current view of the scatterplot. For example, we can examine network data and easily creating a bounding LF for Youtube traffic as follows:Examples
%matplotlib notebook
df = sf.get_dataset(node, split='dev')
axis_obj = sf.plot_scatterplot(node, df, 'Fwd.Packet.Length.Max', 'Avg.Fwd.Segment.Size', "ProtocolName")
# Run this cell after the view in the interactive scatterplot contains mostly YOUTUBE labels
sf.add_scatterplot_lf(node, axis_obj, 'YOUTUBE')Parameters
Parameters
- Commonly used axes_kwargs:
title
: Add a title to the plotxlabel
: Edit x-axis label (default is x_axis_field value)ylabel
: Edit y-axis label (default is y_axis_field value)Raises
Raises
ValueError – If x_axis_field is not one of the columns in the given DataFrame
ValueError – If y_axis_field is not one of the columns in the given DataFrame
ValueError – If label_field is not one of the coluns in the given DataFrame
ValueError – If filter_flag_column is not one of the coluns in the given DataFrame
Returns
Returns
An axis object containing information about the plot that can be passed into add_scatterplot_lf to quickly generate a bounding rect LF
Return type
Return type
matplotlib.pyplot.axis
Name Type Default Info node int
UID of the node. df DataFrame
The data frame with a numeric field over which to plot the scatterplot. x_axis_field str
Name of the field in the df that contains numeric values to plot on the x axis. y_axis_field str
Name of the field in the df that contains numeric values to plot on the y axis. label_field str
Name of the field containing labels for plotting on a per-class basis. drop_unknown bool
True
If True
, remove examples with unknown GT when plotting on a per-class basis.display_legend bool
True
If True
display a legend of the plotted classes.scatter_kwargs Optional[Dict[str, Any]]
None
Arguments to pass through to matplotlib.ax.scatter() (https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html). axes_kwargs Optional[Dict[str, Any]]
None
Arguments to pass through to matplotlib.pyplot.gca().set() (https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.gca)
filter_flag_column Optional[str]
None
Name of column describing whether datapoints have been filtered out are not (can be accessed via show_filtered_flag=True in task.dataset()). color_map str
'rainbow'
Name of the matplotlib color map to use when coloring classes (https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html).