Version: 0.93

snorkelflow.client.nodes.get_dataset

snorkelflow.client.nodes.get_dataset(node, split='dev', batch_uid=None, combiner='AND', show_filtered_flag=False, gt_label=None, all_lfs_filter=None, no_lfs_filter=None, model_filters=None, lf_filters=None, training_set_filters=None, include_tag_uids=None, include_tag_type_uids=None, exclude_tag_uids=None, exclude_tag_type_uids=None)

Filter the dataset and return with GT labels and uids as index.

You can also filter for only data points that have a certain ground truth label or a certain predicted label by a machine learning model. This filter is handy when you want to do your own analysis on an error bucket.

Multiple conditions will be combined with AND semantics.

Examples

# Filter data where all LFs abstain.
df = sf.get_dataset(node, all_lfs_filter="UNKNOWN")
# Filter data where their GT labels are "LABEL".
df = sf.get_dataset(node, gt_label="LABEL")
# Filter data with a set of rules.
df = sf.get_dataset(
    node,
    gt_label="LABEL",
    model_filters=[(1, "LABEL"), (2, "LABEL")],
    lf_filters=[("my_lf", "LABEL")],
    training_set_filters=[(1, "LABEL")],
    combiner="AND"
)

# Generate column statistics and summaries for the dataset
df = sf.get_dataset(node)
df.describe(include="all")

Parameters Parameters
Returns Returns: A Pandas DataFrame for give split filtered according to parameters
Return type Return type: pd.DataFrame

Name	Type	Default	Info
node	`int`		UID of the node.
split	`str`	`'dev'`	The split to be loaded. Only `"dev"`, `"valid"`, and `"test"` splits are allowed. Default to `"dev"`.
batch_uid	`Optional[int]`	`None`	UID of the annotation batch to filter.
combiner	`str`	`'AND'`	Combiner to apply across all filters, by default `"AND"`
show_filtered_flag	`bool`	`False`	If True, return all datapoints and a column called “filtered_flag” with “True” or “False” values specifying whether the data is included in the provided filters.
gt_label	`Optional[str]`	`None`	If set, include only data points where ground truth is this label string.
all_lfs_filter	`Optional[str]`	`None`	If set, include only data points where all LFs vote for the label string passed.
no_lfs_filter	`Optional[str]`	`None`	If set, include only data points where no LFs vote for the label string passed.
model_filters	`Optional[List[Tuple[int, str]]]`	`None`	Tuple where the first value is the model_id, and the second is predicted label or voting pattern. If provided, include only data points that match this pattern.
lf_filters	`Optional[List[Tuple[str, str]]]`	`None`	List of tuples where first value in tuple is LF name and second value is assigned label or voting pattern. If provided, include only data points that match this pattern.
training_set_filters	`Optional[List[Tuple[int, str]]]`	`None`	List of tuples where first value in tuple is training set ID and second value is assigned label or voting pattern. If provided, include only data points that match this pattern.
include_tag_uids	`Optional[List[int]]`	`None`	List of tag types, only data points which include this tag type will be included.
include_tag_type_uids	`Optional[List[int]]`	`None`	List of tag types, only data points which include this tag type will be included.
exclude_tag_uids	`Optional[List[int]]`	`None`	List of tag types, only data points which exclude this tag type will be included. Exclusion will supersede inclusion in case of collision.
exclude_tag_type_uids	`Optional[List[int]]`	`None`	List of tag types, only data points which exclude this tag type will be included. Exclusion will supersede inclusion in case of collision.

Examples​

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

Examples

Parameters

Returns

Return type