snorkelflow.client.nodes.resample_split
- snorkelflow.client.nodes.resample_split(node, split='dev', max_total=None, max_labeled=None, min_per_class=1, sample_by_docs=False, seed=123, split_datasources=None, x_uids=None, context_uids=None)
Resample a split of the dataset.
Parameters
Parameters
Raises
Raises
ValueError – If split is not in the set of valid splits.
Return type
Return type
None
Name Type Default Info node int
UID of the node. split str
'dev'
Split to resample. max_total Optional[int]
None
The maximum total number of examples in the resampled split. This will be approximate for extraction tasks where we fetch examples by whole context only. max_labeled Optional[int]
None
The maximum number of labeled examples to include in sampled split. min_per_class Optional[int]
1
The minimum number of labeled examples per class to include in sampled split. sample_by_docs bool
False
If True, then the numbers (max_total, max_labeled) are interpreted as number of docs (rather than number of candidates). seed Optional[int]
123
A random seed to make sampling deterministic for a given dataset. split_datasources Optional[List[int]]
None
A list of the datasources that we want to resample from. If None, sample from all datasources. x_uids Optional[List[str]]
None
A list of x_uids. If provided, overrides all other keyword arguments and pulls in the provided x_uids instead. This can only be used for sampling datapoints into the dev split. context_uids Optional[List[int]]
None
A list of document-level int uids to sample. If provided, overrides all other keyword arguments and pulls in the x_uids corresponding to the document-level uids instead. This can only be used for sampling datapoints into the dev split. Examples
# Resample the valid split
sf.resample_split(node, split="valid", seed=123)
# Sample specific datapoints into the dev split using the index
sf.resample_split(node, x_uids=["span::5", "span::7"])
# Sample specific documents into the dev split using the input index
sf.resample_split(node, context_uids=[1, 2])