SDK quickstart
This guide shows you how to access or install Snorkel's SDK, and then use it to explore core data management functions.
Install the SDK locally
Requirements
- Python version 3.9-3.11
- A Snorkel account with admin or developer access
- Your Snorkel API key
- Your Snorkel instance host name or IP address. You can copy the host name from your Snorkel instance's URL.
Use a Conda environment for your local Snorkel installation. For example, you can run these commands to create and activate a Python 3.11 environment:
conda create -n snorkel-env python=3.11
conda activate snorkel-env
Install the SDK
Run the following command to install the basic Snorkel SDK in your Python environment:
pip install 'snorkelai-sdk[default]' \
--index-url https://"{SNORKEL_PLATFORM_API_KEY}"@{your-snorkel-hostname}/sdk/repo \
--extra-index-url https://pypi.org/simple
Known issues
--extra-index-url https://pypi.org/simple
must be referenced explicitly in the installation.
Authentication
All API requests to Snorkel must be authenticated. You need an API key, and the
Python SDK client needs to use an authenticated
SnorkeflSDKContext
object to make API requests to Snorkel services.
Follow these instructions to generate an API key:
When you connect to Snorkel locally or from another external system, you must provide additional settings and authentication secrets. Use the following connection template:
import os
# Core Snorkel SDK imports
import snorkelai.sdk.client as sai
from snorkelai.sdk.develop import Dataset, Slice, Batch
print("✅ Snorkel AI SDK imported successfully")
# Snorkel SDK configuration
SAI_CONFIG = {
"endpoint": "https://<your-snorkel-hostname>",
"minio_endpoint": "https://<your-minio-endpoint>",
"api_key": "<your-api-key>",
"workspace_name": "<your-workspace-name>",
"debug": True # Optional: set to False to disable debug logging
}
# Initialize Snorkel context
ctx = sai.SnorkelSDKContext.from_endpoint_url(**SAI_CONFIG)
endpoint
: You can copy the host name from your Snorkel instance's URL.api_key
: Your Snorkel API key- Contact your Snorkel admin or support representative to obtain your
minio_endpoint
.
While this quickstart doesn't require MinIO access beyond the minio_endpoint
configuration, you may need to set a minio_access_key
and minio_secret_key
for more advanced data operations. Contact your Snorkel representative for
assistance.
Access the SDK from a Snorkel-hosted instance
Requirements
- A Snorkel account with admin or developer access
Access the SDK
You can call the Snorkel SDK from a Jupyter notebook hosted on your Snorkel instance.
-
Select Notebook from the left navigation.
-
Select + to create a new notebook from scratch, or upload one.
-
Select Python 3 (ipykernel) from the Notebook section. A new Jupyter notebook opens.
Authentication
When using a Snorkel-hosted notebook, Snorkel automatically generates an API key for your user and assigns it as an environment variable. When you log out and log back in, Snorkel rotates this key to ensure security.
import snorkelai.sdk.client as sai
# Set your workspace
workspace_name = 'YOUR_WORKSPACE_NAME' #INPUT - Replace with your workspace name
# Configure client context for Snorkel instance
ctx = sai.SnorkelSDKContext.from_endpoint_url(
workspace_name=workspace_name,
)
Note that the workspace_name
is not required when using the default workspace.
Now you can use Snorkel's SDK.
Quickstart: SDK connection and dataset exploration
This quickstart guide will walk you through connecting to Snorkel and exploring your workspace using the SDK. You'll learn how to authenticate and explore core data management functions.
Getting started
Begin by importing the necessary packages and setting up your connection:
import snorkelai.sdk.client as sai
from snorkelai.sdk.develop import Dataset, Slice
# Connect to Snorkel using the authentication block from above
# ctx = sai.SnorkelSDKContext.from_endpoint_url( ... ) # Input external or Snorkel-hosted connection details
print("✅ SDK imported successfully")
For the connection, uncomment the ctx
line and replace it with the full
authentication block from the appropriate section above.
Explore your workspace
Explore datasets available in your workspace:
# Verify connection
print(f"Connected to workspace: {ctx.workspace_name}")
# List existing datasets in your workspace
# Note: For workspaces with many datasets, this may take a moment
print("\nRetrieving datasets from workspace...")
datasets = Dataset.list()
print(f"Found {len(datasets)} datasets in workspace")
# Show first few datasets only to avoid overwhelming output
max_display = 5
datasets_to_show = datasets[:max_display]
print(f"\nShowing first {len(datasets_to_show)} datasets:")
for dataset in datasets_to_show:
print(f"- {dataset.name}")
print(f" UID: {dataset.uid}")
print(f" Created: {getattr(dataset, 'created_at', 'Unknown')}")
print()
if len(datasets) > max_display:
print(f"... and {len(datasets) - max_display} more datasets")
Create a dataset
Create a new dataset to demonstrate SDK capabilities:
# Create a new dataset (metadata only - no data upload required)
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
dataset_name = f"quickstart-demo-{timestamp}"
new_dataset = Dataset.create(dataset_name)
print(f"✅ Created dataset: {new_dataset.name}")
print(f"Dataset UID: {new_dataset.uid}")
Display dataset metadata
Explore dataset information without accessing the underlying data:
# Refresh dataset list to include our new dataset
datasets = Dataset.list()
print(f"\nNow have {len(datasets)} datasets in workspace")
# Show our newly created dataset and a couple others
print(f"\nRecent datasets (showing up to 3):")
recent_datasets = datasets[:3] # Limit to first 3 for display
for dataset in recent_datasets:
marker = "← NEW" if dataset.name == new_dataset.name else ""
print(f"- {dataset.name} {marker}")
print(f" UID: {dataset.uid}")
if datasets:
# Work with our newly created dataset
dataset = new_dataset
print(f"\nExploring our new dataset: {dataset.name}")
print(f"Dataset UID: {dataset.uid}")
# Show available dataset properties
print(f"\nDataset properties:")
for attr in ['name', 'uid', 'created_at', 'description']:
if hasattr(dataset, attr):
value = getattr(dataset, attr)
print(f"- {attr}: {value}")
print("✅ Dataset metadata exploration successful")
else:
print("No datasets found.")
Explore dataset organization with slices
View how data is organized with slices:
if datasets:
# Use the first existing dataset (not our empty new one)
dataset = datasets[0] if datasets[0].name != new_dataset.name else (datasets[1] if len(datasets) > 1 else datasets[0])
# List existing slices for the dataset
existing_slices = Slice.list(dataset=dataset.uid)
print(f"\nSlices for dataset '{dataset.name}':")
print(f"Found {len(existing_slices)} slices:")
for slice_obj in existing_slices:
print(f"- {slice_obj.name}")
print(f" Description: {slice_obj.description}")
# Note: Some slice objects may not have uid attribute
if hasattr(slice_obj, 'uid'):
print(f" UID: {slice_obj.uid}")
print()
print("✅ Slice exploration complete")
SDK method verification
Finally, verify key SDK functions:
print("\n=== SDK Method Verification ===")
# Test context properties
print(f"Workspace access: {'✅ Available' if hasattr(ctx, 'workspace_name') else '❌ Missing'}")
print(f"Debug mode: {'✅ Available' if hasattr(ctx, 'set_debug') else '❌ Missing'}")
# Test Dataset class methods
print(f"Dataset listing: {'✅ Available' if hasattr(Dataset, 'list') else '❌ Missing'}")
print(f"Dataset creation: {'✅ Available' if hasattr(Dataset, 'create') else '❌ Missing'}")
# Test Slice class methods
print(f"Slice listing: {'✅ Available' if hasattr(Slice, 'list') else '❌ Missing'}")
print(f"Slice creation: {'✅ Available' if hasattr(Slice, 'create') else '❌ Missing'}")
print(f"\nWorkspace summary:")
print(f"- Workspace: {ctx.workspace_name}")
print(f"- Total datasets: {len(datasets) if 'datasets' in locals() else 0}")
print(f"- SDK connection: ✅ Working")
print("\n✅ SDK exploration complete!")
In this tutorial, you've successfully connected to the Snorkel AI Data Development Platform and explored your workspace using the SDK. You've seen how to authenticate, list datasets, explore data organization, and verify SDK functionality. This provides the foundation for more advanced SDK workflows.
Next steps
Explore the SDK reference documentation.