Uploading files to file collections
User files are the assets related to data points in a given data source. For example, if you have a PDF application, the user files would be the actual .pdf
files. For a computer vision (CV) application, the user files would be the .jpg
or .png
image files.
PDF and CV applications display these files throughout Snorkel Flow. When files are uploaded via Snorkel Flow, they reside on the same server that hosts the Snorkel Flow application.
Permissions
Instance-level permissions are required to upload user files:
- Direct file uploads from the computer that is running Snorkel Flow.
- Remote file transfer from a cloud storage provider such as Amazon S3 or Google Cloud Storage.
An administrator may disable upload permissions for the instance you are working on, which could prevent you from uploading or transferring user files.
To upload user files
- To upload user files, navigate to the Datasets page.
- Select the + Upload new files button to open up the file upload modal.
- In the upload file modal, complete the fields to transfer remote files or upload local files:
- Use case: Select the file type you want to upload from the dropdown. This restricts the file type that can be uploaded or transferred. For
image
applications,.jpg
,.jpeg
, and.png
are allowed. Forpdf
applications, only.pdf
files are allowed. - Import to: Name the folder to which you want to upload files. If you enter an unused name, a new folder is created where the files are uploaded.
If you enter the name of an existing folder, files are added to the existing folder. For existing folders, you can select + Upload to folder in existing folder in the file collections list. See Managing file collections. - Select the Remote storage and Local storage tabs:
-
Remote Storage
- Service: Select the cloud storage provider that hosts your files: Amazon S3 or Google Cloud Storage.
- Remote path: Enter the URL or path to the folder containing your remotely hosted files. For S3, you can include or exclude a protocol prefix with these acceptable inputs:
s3://my-bucket/path/to/files
my-bucket/path/to/files
- Use credentials: To authenticate access to your remote bucket, select Use credentials:
- S3: Input credentials in the Access key, Secret key, and/or Token fields.
- GCS: Upload a credentials file.
- S3: Input credentials in the Access key, Secret key, and/or Token fields.
-
Local Storage
-
- Choose Files: Select this input to open your operating system's file browser. Select individual or multiple files to upload.
For direct uploads from your computer, you are limited to a maximum of 1,000 files and/or a total upload size of 1 GB at a time. These are not limitations on the folder that you upload to. You can upload subsequent files to the same folder using + Upload to folder in existing folder in the file collections list. See Managing file collections.
You cannot select directories, but you can use shortcuts to select all files in a given directory. For example, on MacOS, enterCMD + A
to select all files inside a given directory.
- Choose Files: Select this input to open your operating system's file browser. Select individual or multiple files to upload.
-
-
- Overwrite duplicates during import
- Checked: Any files you upload in the collection will replace existing files with the same name.
- Unchecked: Any files you upload with the same name as existing files in the collection are not uploaded.
- Use case: Select the file type you want to upload from the dropdown. This restricts the file type that can be uploaded or transferred. For
- Select Submit.
- If you selected the Remote storage option, your files are transferred in the background. You can check the status of this transfer using the Jobs icon:
- If you selected the Local Storage option, your files are uploaded. A loading spinner appears, and the user interface is disabled.
IMPORTANTDo not close the tab. Wait for an alert to confirm the upload finished successfully. It may take several minutes to finish uploading.
Once you have uploaded or transferred files to your Snorkel Flow instance, you can associate them with your data sources when uploading new datasets. For more information about viewing your uploaded files, see Manage file collections.