Skip to main content
Version: 0.95

Uploading files to file collections

User files are the assets related to data points in a given data source. For example, if you have a PDF application, the user files would be the actual .pdf files. For a computer vision (CV) application, the user files would be the .jpg or .png image files.

PDF and CV applications display these files throughout Snorkel Flow. When files are uploaded via Snorkel Flow, they reside on the same server that hosts the Snorkel Flow application.

Permissions

Instance-level permissions are required to upload user files:

  • Direct file uploads from the computer that is running Snorkel Flow.
  • Remote file transfer from a cloud storage provider such as Amazon S3 or Google Cloud Storage.

An administrator may disable upload permissions for the instance you are working on, which could prevent you from uploading or transferring user files.

To upload user files

  1. To upload user files, navigate to the Datasets page.
  2. Select the + Upload new files button to open up the file upload modal.
  3. In the upload file modal, complete the fields to transfer remote files or upload local files:
    File-Upload-Modal
    • Use case: Select the file type you want to upload from the dropdown. This restricts the file type that can be uploaded or transferred. For image applications, .jpg, .jpeg, and .png are allowed. For pdf applications, only .pdf files are allowed.
    • Import to: Name the folder to which you want to upload files. If you enter an unused name, a new folder is created where the files are uploaded.
      If you enter the name of an existing folder, files are added to the existing folder.  For existing folders, you can select + Upload to folder in existing folder in the file collections list. See Managing file collections.
    • Select the Remote storage and Local storage tabs:
      • Remote Storage

        Remote-Storage

        • Service: Select the cloud storage provider that hosts your files: Amazon S3 or Google Cloud Storage.
        • Remote path: Enter the URL or path to the folder containing your remotely hosted files. For S3, you can include or exclude a protocol prefix with these acceptable inputs:
          • s3://my-bucket/path/to/files
          • my-bucket/path/to/files
        • Use credentials: To authenticate access to your remote bucket, select Use credentials:
          • S3: Input credentials in the Access key, Secret key, and/or Token fields.
            S3 Credential Inputs
          • GCS: Upload a credentials file.
            GCS Credential Inputs
      • Local Storage

        Local-Storage

          • Choose Files: Select this input to open your operating system's file browser. Select individual or multiple files to upload.
            For direct uploads from your computer, you are limited to a maximum of 1,000 files and/or a total upload size of 1 GB at a time. These are not limitations on the folder that you upload to. You can upload subsequent files to the same folder using + Upload to folder in existing folder in the file collections list. See Managing file collections.
            You cannot select directories, but you can use shortcuts to select all files in a given directory. For example, on MacOS, enter CMD + A to select all files inside a given directory.
    • Overwrite duplicates during import
      • Checked: Any files you upload in the collection will replace existing files with the same name.
      • Unchecked: Any files you upload with the same name as existing files in the collection are not uploaded.
  4. Select Submit.
    • If you selected the Remote storage option, your files are transferred in the background. You can check the status of this transfer using the Jobs icon:Jobs-Icon
    • If you selected the Local Storage option, your files are uploaded. A loading spinner appears, and the user interface is disabled.
      IMPORTANTDo not close the tab. Wait for an alert to confirm the upload finished successfully. It may take several minutes to finish uploading.

Once you have uploaded or transferred files to your Snorkel Flow instance, you can associate them with your data sources when uploading new datasets. For more information about viewing your uploaded files, see Manage file collections.