snorkelflow.client.nodes.resample_split
- snorkelflow.client.nodes.resample_split(node, split='dev', max_total=None, max_labeled=None, min_per_class=1, sample_by_docs=False, seed=123, split_datasources=None, x_uids=None, context_uids=None)
Resample a split of the dataset.
Parameters
Parameters
Raises
Raises
ValueError – If split is not in the set of valid splits.
Return type
Return type
None
Name Type Default Info node intUID of the node. split str'dev'Split to resample. max_total Optional[int]NoneThe maximum total number of examples in the resampled split. This will be approximate for extraction tasks where we fetch examples by whole context only. max_labeled Optional[int]NoneThe maximum number of labeled examples to include in sampled split. min_per_class Optional[int]1The minimum number of labeled examples per class to include in sampled split. sample_by_docs boolFalseIf True, then the numbers (max_total, max_labeled) are interpreted as number of docs (rather than number of candidates). seed Optional[int]123A random seed to make sampling deterministic for a given dataset. split_datasources Optional[List[int]]NoneA list of the datasources that we want to resample from. If None, sample from all datasources. x_uids Optional[List[str]]NoneA list of x_uids. If provided, overrides all other keyword arguments and pulls in the provided x_uids instead. This can only be used for sampling datapoints into the dev split. context_uids Optional[List[int]]NoneA list of document-level int uids to sample. If provided, overrides all other keyword arguments and pulls in the x_uids corresponding to the document-level uids instead. This can only be used for sampling datapoints into the dev split. Examples
# Resample the valid split
sf.resample_split(node, split="valid", seed=123)
# Sample specific datapoints into the dev split using the index
sf.resample_split(node, x_uids=["span::5", "span::7"])
# Sample specific documents into the dev split using the input index
sf.resample_split(node, context_uids=[1, 2])