Skip to main content
Version: 0.93

snorkelflow.client.object_storage.download_remote_object

snorkelflow.client.object_storage.download_remote_object(url, bucket, obj, access_key, secret_key, region='', sync=True)

Download a file from an external s3-compatbile storage (e.g: MinIO, GCS).

This can be useful for downloading a datasource, ground truth, or any other file for usage in-platform.

Parameters:
  • url (str) – URL of the remote storage server (e.g. https://my-minio-server.domain.com), must have URI scheme (http:// or https://)

  • bucket (str) – Bucket name for the object

  • obj (str) – Path of the object inside the bucket (e.g. enron_spam_test.csv)

  • access_key (str) – Access key for the remote storage server

  • secret_key (str) – Secret key for the remote storage server

  • region (str, default: '') – Region for the s3 service (ignored for MinIO & GCS)

  • sync (bool, default: True) – Poll job status and block until complete?

Returns:

The in-platform path of the object downloaded if sync mode used

Return type:

str

Examples

# Downloading from minio
path = sf.download_remote_object(
"https://my-minio.domain.com/", # remote host
"training-data-bucket", # bucket
"dev-split.csv", # object
"admin", # access key
"password") # secret key

# Downloading from GCS bucket. Please use HMAC key authentication
# https://cloud.google.com/storage/docs/authentication/hmackeys
gcs_path = sf.download_remote_object("https://storage.googleapis.com",
"gcs-bucket-name",
"gcs-object.csv",
"GCS_HMAC_KEY",
"GCS_HMAC_SECRET")

# Downloading from s3
s3_path = sf.download_remote_object(
"https://s3.amazonaws.com/",
"some-s3-bucket",
"dev-split.csv",
"AWS_ACCESS_KEY",
"AWS_SECRET_KEY",
"us-west-2") # region is required for s3