Version: 0.95

snorkelflow.client.nodes.resample_split

snorkelflow.client.nodes.resample_split(node, split='dev', max_total=None, max_labeled=None, min_per_class=1, sample_by_docs=False, seed=123, split_datasources=None, x_uids=None, context_uids=None)

Resample a split of the dataset.

Parameters Parameters
Raises Raises: ValueError – If split is not in the set of valid splits.
Return type Return type: None

Name	Type	Default	Info
node	`int`		UID of the node.
split	`str`	`'dev'`	Split to resample.
max_total	`Optional[int]`	`None`	The maximum total number of examples in the resampled split. This will be approximate for extraction tasks where we fetch examples by whole context only.
max_labeled	`Optional[int]`	`None`	The maximum number of labeled examples to include in sampled split.
min_per_class	`Optional[int]`	`1`	The minimum number of labeled examples per class to include in sampled split.
sample_by_docs	`bool`	`False`	If True, then the numbers (max_total, max_labeled) are interpreted as number of docs (rather than number of candidates).
seed	`Optional[int]`	`123`	A random seed to make sampling deterministic for a given dataset.
split_datasources	`Optional[List[int]]`	`None`	A list of the datasources that we want to resample from. If None, sample from all datasources.
x_uids	`Optional[List[str]]`	`None`	A list of x_uids. If provided, overrides all other keyword arguments and pulls in the provided x_uids instead. This can only be used for sampling datapoints into the dev split.
context_uids	`Optional[List[int]]`	`None`	A list of document-level int uids to sample. If provided, overrides all other keyword arguments and pulls in the x_uids corresponding to the document-level uids instead. This can only be used for sampling datapoints into the dev split.

Examples

# Resample the valid split
sf.resample_split(node, split="valid", seed=123)

# Sample specific datapoints into the dev split using the index
sf.resample_split(node, x_uids=["span::5", "span::7"])

# Sample specific documents into the dev split using the input index
sf.resample_split(node, context_uids=[1, 2])

Parameters

Parameters​

Raises

Raises​

Return type

Return type​

Examples​

Parameters

Raises

Return type

Examples