snorkelflow.client.datasources.prep_and_ingest_datasource
- snorkelflow.client.datasources.prep_and_ingest_datasource(dataset, paths, input_type, split, run_datasource_checks=True)
Create a data source.
Parameters
Parameters
Returns
Returns
UID of the created data source if sync mode used
Return type
Return type
Optional[int]
Name Type Default Info dataset Union[str, int]Name or UID of the dataset to create the data source in. paths List[str]List of paths to the data source (e.g. MinIO, S3). input_type strType of input type of the files in the folder to be processed (eg. pdf, image) The supported types are pdf,image, andhocrsplit strSplit of the dataset to add data source to (train, valid, or test). run_datasource_checks boolTrueWhether we should run data source checks before ingestion (defaults to True). Examples
>>> sf.prep_and_ingest_datasource(
>>> dataset="test-dataset",
>>> paths=["minio://pdf-bucket/"],
>>> input_type="pdf",
>>> split="train",
>>> )
1