Version: 25.5

SDK quickstart

This guide shows you how to access or install Snorkel's SDK, and then use it to explore core data management functions.

Install the SDK locally
Access the SDK from a Snorkel-hosted instance

Install the SDK locally

Requirements

Python version 3.9-3.11
A Snorkel account with admin or developer access
Your Snorkel API key
Your Snorkel instance host name or IP address. You can copy the host name from your Snorkel instance's URL.

tip

Use a Conda environment for your local Snorkel installation. For example, you can run these commands to create and activate a Python 3.11 environment:

conda create -n snorkel-env python=3.11

conda activate snorkel-env

Install the SDK

Run the following command to install the basic Snorkel SDK in your Python environment:

pip install 'snorkelai-sdk[default]' \
--index-url https://"{SNORKEL_PLATFORM_API_KEY}"@{your-snorkel-hostname}/sdk/repo \
--extra-index-url https://pypi.org/simple

Known issues

--extra-index-url https://pypi.org/simple must be referenced explicitly in the installation.

Authentication

All API requests to Snorkel must be authenticated. You need an API key, and the Python SDK client needs to use an authenticated SnorkeflSDKContext object to make API requests to Snorkel services.

Follow these instructions to generate an API key:

User and Admin settings

When you connect to Snorkel locally or from another external system, you must provide additional settings and authentication secrets. Use the following connection template:

import os

# Core Snorkel SDK imports
import snorkelai.sdk.client as sai
from snorkelai.sdk.develop import Dataset, Slice, Batch

print("✅ Snorkel AI SDK imported successfully")

# Snorkel SDK configuration
SAI_CONFIG = {
   "endpoint": "https://<your-snorkel-hostname>",
   "minio_endpoint": "https://<your-minio-endpoint>",
   "api_key": "<your-api-key>",
   "workspace_name": "<your-workspace-name>",
   "debug": True  # Optional: set to False to disable debug logging
}

# Initialize Snorkel context
ctx = sai.SnorkelSDKContext.from_endpoint_url(**SAI_CONFIG)

endpoint: You can copy the host name from your Snorkel instance's URL.
api_key: Your Snorkel API key
Contact your Snorkel admin or support representative to obtain your minio_endpoint.

note

While this quickstart doesn't require MinIO access beyond the minio_endpoint configuration, you may need to set a minio_access_key and minio_secret_key for more advanced data operations. Contact your Snorkel representative for assistance.

Access the SDK from a Snorkel-hosted instance

Requirements

A Snorkel account with admin or developer access

Access the SDK

You can call the Snorkel SDK from a Jupyter notebook hosted on your Snorkel instance.

Select Notebook from the left navigation.
Select + to create a new notebook from scratch, or upload one.
Select Python 3 (ipykernel) from the Notebook section. A new Jupyter notebook opens.

Authentication

When using a Snorkel-hosted notebook, Snorkel automatically generates an API key for your user and assigns it as an environment variable. When you log out and log back in, Snorkel rotates this key to ensure security.

import snorkelai.sdk.client as sai

# Set your workspace
workspace_name = 'YOUR_WORKSPACE_NAME' #INPUT - Replace with your workspace name

# Configure client context for Snorkel instance
ctx = sai.SnorkelSDKContext.from_endpoint_url(
    workspace_name=workspace_name,
)

Note that the workspace_name is not required when using the default workspace.

Now you can use Snorkel's SDK.

Quickstart: SDK connection and dataset exploration

This quickstart guide will walk you through connecting to Snorkel and exploring your workspace using the SDK. You'll learn how to authenticate and explore core data management functions.

Getting started

Begin by importing the necessary packages and setting up your connection:

import snorkelai.sdk.client as sai
from snorkelai.sdk.develop import Dataset, Slice

# Connect to Snorkel using the authentication block from above
# ctx = sai.SnorkelSDKContext.from_endpoint_url( ... ) # Input external or Snorkel-hosted connection details

print("✅ SDK imported successfully")

For the connection, uncomment the ctx line and replace it with the full authentication block from the appropriate section above.

Explore your workspace

Explore datasets available in your workspace:

# Verify connection
print(f"Connected to workspace: {ctx.workspace_name}")

# List existing datasets in your workspace
# Note: For workspaces with many datasets, this may take a moment
print("\nRetrieving datasets from workspace...")
datasets = Dataset.list()
print(f"Found {len(datasets)} datasets in workspace")

# Show first few datasets only to avoid overwhelming output
max_display = 5
datasets_to_show = datasets[:max_display]
print(f"\nShowing first {len(datasets_to_show)} datasets:")

for dataset in datasets_to_show:
    print(f"- {dataset.name}")
    print(f"  UID: {dataset.uid}")
    print(f"  Created: {getattr(dataset, 'created_at', 'Unknown')}")
    print()

if len(datasets) > max_display:
    print(f"... and {len(datasets) - max_display} more datasets")

Create a dataset

Create a new dataset to demonstrate SDK capabilities:

# Create a new dataset (metadata only - no data upload required)
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
dataset_name = f"quickstart-demo-{timestamp}"

new_dataset = Dataset.create(dataset_name)
print(f"✅ Created dataset: {new_dataset.name}")
print(f"Dataset UID: {new_dataset.uid}")

Display dataset metadata

Explore dataset information without accessing the underlying data:

# Refresh dataset list to include our new dataset
datasets = Dataset.list()
print(f"\nNow have {len(datasets)} datasets in workspace")

# Show our newly created dataset and a couple others
print(f"\nRecent datasets (showing up to 3):")
recent_datasets = datasets[:3]  # Limit to first 3 for display
for dataset in recent_datasets:
    marker = "← NEW" if dataset.name == new_dataset.name else ""
    print(f"- {dataset.name} {marker}")
    print(f"  UID: {dataset.uid}")

if datasets:
    # Work with our newly created dataset
    dataset = new_dataset
    print(f"\nExploring our new dataset: {dataset.name}")
    print(f"Dataset UID: {dataset.uid}")

    # Show available dataset properties
    print(f"\nDataset properties:")
    for attr in ['name', 'uid', 'created_at', 'description']:
        if hasattr(dataset, attr):
            value = getattr(dataset, attr)
            print(f"- {attr}: {value}")

    print("✅ Dataset metadata exploration successful")
else:
    print("No datasets found.")

Explore dataset organization with slices

View how data is organized with slices:

if datasets:
    # Use the first existing dataset (not our empty new one)
    dataset = datasets[0] if datasets[0].name != new_dataset.name else (datasets[1] if len(datasets) > 1 else datasets[0])

    # List existing slices for the dataset
    existing_slices = Slice.list(dataset=dataset.uid)
    print(f"\nSlices for dataset '{dataset.name}':")
    print(f"Found {len(existing_slices)} slices:")

    for slice_obj in existing_slices:
        print(f"- {slice_obj.name}")
        print(f"  Description: {slice_obj.description}")
        # Note: Some slice objects may not have uid attribute
        if hasattr(slice_obj, 'uid'):
            print(f"  UID: {slice_obj.uid}")
        print()

    print("✅ Slice exploration complete")

SDK method verification

Finally, verify key SDK functions:

print("\n=== SDK Method Verification ===")

# Test context properties
print(f"Workspace access: {'✅ Available' if hasattr(ctx, 'workspace_name') else '❌ Missing'}")
print(f"Debug mode: {'✅ Available' if hasattr(ctx, 'set_debug') else '❌ Missing'}")

# Test Dataset class methods
print(f"Dataset listing: {'✅ Available' if hasattr(Dataset, 'list') else '❌ Missing'}")
print(f"Dataset creation: {'✅ Available' if hasattr(Dataset, 'create') else '❌ Missing'}")

# Test Slice class methods
print(f"Slice listing: {'✅ Available' if hasattr(Slice, 'list') else '❌ Missing'}")
print(f"Slice creation: {'✅ Available' if hasattr(Slice, 'create') else '❌ Missing'}")

print(f"\nWorkspace summary:")
print(f"- Workspace: {ctx.workspace_name}")
print(f"- Total datasets: {len(datasets) if 'datasets' in locals() else 0}")
print(f"- SDK connection: ✅ Working")

print("\n✅ SDK exploration complete!")

In this tutorial, you've successfully connected to the Snorkel AI Data Development Platform and explored your workspace using the SDK. You've seen how to authenticate, list datasets, explore data organization, and verify SDK functionality. This provides the foundation for more advanced SDK workflows.

Next steps

Explore the SDK reference documentation.

Install the SDK locally​

Requirements​

Install the SDK​

Known issues​

Authentication​

Access the SDK from a Snorkel-hosted instance​

Requirements​

Access the SDK​

Authentication​

Quickstart: SDK connection and dataset exploration​

Getting started​

Explore your workspace​

Create a dataset​

Display dataset metadata​

Explore dataset organization with slices​

SDK method verification​

Next steps​

Install the SDK locally

Requirements

Install the SDK

Known issues

Authentication

Access the SDK from a Snorkel-hosted instance

Requirements

Access the SDK

Authentication

Quickstart: SDK connection and dataset exploration

Getting started

Explore your workspace

Create a dataset

Display dataset metadata

Explore dataset organization with slices

SDK method verification

Next steps