Working with MinIO
Notice of deprecation
MinIO is scheduled for deprecation in future versions of the Snorkel AI Data Development Platform through early 2025, with its functionality being replaced by Snorkel File Service (SFS).
- We will continue to proactively communicate changes regarding deprecation directly with customers and update documentation when appropriate. If you have concerns regarding current usage of MinIO and whether it will be supported by SFS, communicate with your Snorkel Success Manager.
- The MinIO Console will be deprecated as of Q4 2024, but will still be accessible until the end of Q1 2025 to support backwards compatibility for older workflows. After Q1 2025, access to MinIO Console will be removed.
- Instead, use the Files feature for uploading PDFs and images. Support for arbitrary file type upload and basic file management utilities within the Files feature will be provided by end of Q1 2025 to meet MinIO Console core feature parity.
- For files that are not PDFs and images, you can continue to use the Snorkel AI Data Development Platform's SDK. These SDK workflows are detailed below.
- All SDK workflows documented here will be supported by SFS.
Overview
MinIO is an object store that is compatible with S3. A MinIO API is shipped with the Snorkel AI Data Development Platform for data management across the platform.
It allows users to upload and download files that are accessible from both Notebooks within the Snorkel AI Data Development Platform as well as in Operators.
Connecting to MinIO
Authentication to the Snorkel MinIO is handled automatically if you are working in Snorkel Notebooks. If you're accessing MinIO elsewhere, you will need to set the following environment variables MINIO_URL
, MINIO_ACCESS_KEY
, and MINIO_SECRET_KEY
.
Using MinIO with the Snorkel SDK
When working with the Snorkel SDK, you first need to set the appropriate context to ensure code is being applied to the right workspace and application. To use MinIO, you need to define the workspace:
import snorkelai.sdk.client as sai
# Set your workspace
workspace_name = 'YOUR_WORKSPACE_NAME' #INPUT - Replace with your workspace name
# Configure client context for Snorkel instance
ctx = sai.SnorkelSDKContext.from_endpoint_url(
workspace_name=workspace_name,
)
This sai
client object will be used in sections below
File upload
The SDK provides two explicit methods for uploading files (upload_file
) and directories (upload_dir
) using the sf
client object defined at the top of this section. Both absolute and relative paths are supported. Examples are shown below:
# Upload a file from a local directory to MinIO
local_file_path = "/path/to/local/report.pdf"
remote_file_path = "minio://bucket/path/to/some/report.pdf"
uploaded_file_path = sai.upload_file(local_file_path, remote_file_path)
# Upload a directory from a local directory to MinIO
local_directory = "/path/to/local/directory"
remote_directory = "minio://bucket/upload/directory"
uploaded_dir_path = sai.upload_dir(local_directory, remote_directory)
File download
The SDK has two explicit methods for downloading files (download_file
) and directories (download_dir
) using the sf
client object defined at the top of this section. Both absolute and relative paths are supported. Examples are shown below:
# Download a file from MinIO to a local directory
remote_file_path = "minio://bucket/path/to/some/report.pdf"
local_file_path = "/path/to/local/report.pdf"
sai.download_file(remote_file_path, local_file_path)
# Download a directory from MinIO to a local directory
remote_directory = "minio://bucket/upload/directory"
local_directory = "/path/to/local/directory"
sai.download_dir(remote_directory, local_directory)
List directory
To list files in a remote directory, use the list_dir
method with the sf
client object defined at the top of this section. Example is shown below:
remote_directory = "minio://bucket/upload/directory"
sai.list_dir(remote_directory)
Generic file operations
For file operations, use the SDK function snorkelflow.utils.file.open_file
, which will return a file-like object. open_file
works with both MinIO and local paths.
Reading a file from MinIO:
from snorkelflow.utils.file import open_file
with open_file("minio://bucket/path/to/some/file", mode="r") as f:
data = f.read()
Writing a file to MinIO:
from snorkelflow.utils.file import open_file
with open_file("minio://bucket/path/to/some/file", mode="w") as f:
data = f.write("Hello, World!")