snorkelflow.client.datasources.create_datasource
- snorkelflow.client.datasources.create_datasource(dataset, path, file_type, uid_col=None, split='train', datasource_ds=None, reader_kwargs=None, credential_kwargs=None, scheduler=None, load_to_model_nodes=False, sync=True)
- Create a data source. - Parameters- Parameters
- Returns- Returns
- UID of the created data source if sync mode used. Otherwise, return job_id 
- Return type- Return type
- Union[str, int]
 - Name - Type - Default - Info - dataset - Union[str, int]- Name or UID of the dataset to create the data source in. - path - str- Path to the data source (e.g. minio, s3, http). - file_type - str- File type (csv or parquet). - uid_col - Optional[str]- None- Name of the UID column in the data source. The values in this column must be unique non-negative integers that are not duplicated across files. If not specified, we will generate a SnorkelFlow ID column. - split - str- 'train'- Split of the dataset to add data source to (train, valid, or test). - datasource_ds - Optional[str]- None- Datestamp of the data source in YYYY-MM-DD format. - reader_kwargs - Optional[Dict[str, Any]]- None- Dictionary of keyword arguments to pass to Dask read functions. - credential_kwargs - Optional[Dict[str, Any]]- None- Dictionary of credentials for specific data connectors. - scheduler - Optional[str]- None- Dask scheduler (threads, client, or group) to use. - load_to_model_nodes - bool- False- Load datasource in all tasks in dataset?. - sync - bool- True- Poll job status and block until complete?.