snorkelflow.client.object_storage.download_remote_object
- snorkelflow.client.object_storage.download_remote_object(url, bucket, obj, access_key, secret_key, region='', sync=True)
Download a file from an external s3-compatbile storage (e.g: MinIO, GCS).
This can be useful for downloading a datasource, ground truth, or any other file for usage in-platform.
- Parameters:
url (
str
) – URL of the remote storage server (e.g. https://my-minio-server.domain.com), must have URI scheme (http:// or https://)bucket (
str
) – Bucket name for the objectobj (
str
) – Path of the object inside the bucket (e.g. enron_spam_test.csv)access_key (
str
) – Access key for the remote storage serversecret_key (
str
) – Secret key for the remote storage serverregion (
str
, default:''
) – Region for the s3 service (ignored for MinIO & GCS)sync (
bool
, default:True
) – Poll job status and block until complete?
- Returns:
The in-platform path of the object downloaded if sync mode used
- Return type:
str
Examples
# Downloading from minio
path = sf.download_remote_object(
"https://my-minio.domain.com/", # remote host
"training-data-bucket", # bucket
"dev-split.csv", # object
"admin", # access key
"password") # secret key
# Downloading from GCS bucket. Please use HMAC key authentication
# https://cloud.google.com/storage/docs/authentication/hmackeys
gcs_path = sf.download_remote_object("https://storage.googleapis.com",
"gcs-bucket-name",
"gcs-object.csv",
"GCS_HMAC_KEY",
"GCS_HMAC_SECRET")
# Downloading from s3
s3_path = sf.download_remote_object(
"https://s3.amazonaws.com/",
"some-s3-bucket",
"dev-split.csv",
"AWS_ACCESS_KEY",
"AWS_SECRET_KEY",
"us-west-2") # region is required for s3