Skip to main content
Version: 0.95

snorkelflow.client.nodes.resample_split

snorkelflow.client.nodes.resample_split(node, split='dev', max_total=None, max_labeled=None, min_per_class=1, sample_by_docs=False, seed=123, split_datasources=None, x_uids=None, context_uids=None)

Resample a split of the dataset.

Parameters

NameTypeDefaultInfo
nodeintUID of the node.
splitstr'dev'Split to resample.
max_totalOptional[int]NoneThe maximum total number of examples in the resampled split. This will be approximate for extraction tasks where we fetch examples by whole context only.
max_labeledOptional[int]NoneThe maximum number of labeled examples to include in sampled split.
min_per_classOptional[int]1The minimum number of labeled examples per class to include in sampled split.
sample_by_docsboolFalseIf True, then the numbers (max_total, max_labeled) are interpreted as number of docs (rather than number of candidates).
seedOptional[int]123A random seed to make sampling deterministic for a given dataset.
split_datasourcesOptional[List[int]]NoneA list of the datasources that we want to resample from. If None, sample from all datasources.
x_uidsOptional[List[str]]NoneA list of x_uids. If provided, overrides all other keyword arguments and pulls in the provided x_uids instead. This can only be used for sampling datapoints into the dev split.
context_uidsOptional[List[int]]NoneA list of document-level int uids to sample. If provided, overrides all other keyword arguments and pulls in the x_uids corresponding to the document-level uids instead. This can only be used for sampling datapoints into the dev split.

Raises

ValueError – If split is not in the set of valid splits.

Return type

None

Examples

# Resample the valid split
sf.resample_split(node, split="valid", seed=123)

# Sample specific datapoints into the dev split using the index
sf.resample_split(node, x_uids=["span::5", "span::7"])

# Sample specific documents into the dev split using the input index
sf.resample_split(node, context_uids=[1, 2])