Version: 0.91

snorkelflow.client.datasources.prep_and_ingest_datasource

snorkelflow.client.datasources.prep_and_ingest_datasource(dataset, paths, input_type, split, run_datasource_checks=True)

Create a data source.

Parameters Parameters
Returns Returns: UID of the created data source if sync mode used
Return type Return type: Optional[int]

Name	Type	Default	Info
dataset	`Union[str, int]`		Name or UID of the dataset to create the data source in.
paths	`List[str]`		List of paths to the data source (e.g. MinIO, S3).
input_type	`str`		Type of input type of the files in the folder to be processed (eg. pdf, image) The supported types are `pdf`, `image`, and `hocr`
split	`str`		Split of the dataset to add data source to (train, valid, or test).
run_datasource_checks	`bool`	`True`	Whether we should run data source checks before ingestion (defaults to True).

Examples

>>> sf.prep_and_ingest_datasource(
>>>     dataset="test-dataset",
>>>     paths=["minio://pdf-bucket/"],
>>>     input_type="pdf",
>>>     split="train",
>>> )
1

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

Examples​

Parameters

Returns

Return type

Examples