snorkelflow.client.object_storage.download_remote_object
- snorkelflow.client.object_storage.download_remote_object(url, bucket, obj, access_key, secret_key, region='', sync=True)
Download a file from an external s3-compatbile storage (e.g: MinIO, GCS).
This can be useful for downloading a datasource, ground truth, or any other file for usage in-platform.
Parameters
Parameters
Returns
Returns
The in-platform path of the object downloaded if sync mode used
Return type
Return type
str
Name Type Default Info url str
URL of the remote storage server (e.g. https://my-minio-server.domain.com), must have URI scheme (http:// or https://). bucket str
Bucket name for the object. obj str
Path of the object inside the bucket (e.g. enron_spam_test.csv). access_key str
Access key for the remote storage server. secret_key str
Secret key for the remote storage server. region str
''
Region for the s3 service (ignored for MinIO & GCS). sync bool
True
Poll job status and block until complete?. Examples
# Downloading from minio
path = sf.download_remote_object(
"https://my-minio.domain.com/", # remote host
"training-data-bucket", # bucket
"dev-split.csv", # object
"admin", # access key
"password") # secret key
# Downloading from GCS bucket. Please use HMAC key authentication
# https://cloud.google.com/storage/docs/authentication/hmackeys
gcs_path = sf.download_remote_object("https://storage.googleapis.com",
"gcs-bucket-name",
"gcs-object.csv",
"GCS_HMAC_KEY",
"GCS_HMAC_SECRET")
# Downloading from s3
s3_path = sf.download_remote_object(
"https://s3.amazonaws.com/",
"some-s3-bucket",
"dev-split.csv",
"AWS_ACCESS_KEY",
"AWS_SECRET_KEY",
"us-west-2") # region is required for s3