Snorkel AI Data Development Platform v25.5 (STS) release notes

Snorkel is launching our AI Data Development Platform with major improvements. This launch brings evaluation and prompt development experiences designed for production-ready and agentic AI workflows.

This release delivers a powerful evaluation system that combines human expertise with generative AI. Enterprise customers can set trustworthy performance metrics for their AI agents and applications, ensuring quality and reliability in production environments.

For customers with the Snorkel AI Data Development Platform v25.4 or earlier, you can continue to use your current instance. Your Predictive ML use cases can't be transferred to v25.5 or later. You must do a new installation for v25.5.

Breaking changes

SDK Major refactor: Action required

The Snorkel AI SDK has undergone significant refactoring in this release that requires updates to your existing code. You must update all imports from snorkelflow.* to snorkelai.sdk.*. See the SDK quickstart for example usage.

Features and improvements

Prompt development

You can now filter data points based on field values in the prompt-dev workflow and LLMAJ evaluator builder in benchmark.
A new comparison table view allows you to see all prompt versions and runs for a data point in one place.
Prompt executions are now organized hierarchically under prompt versions, enabling you to run the same prompt multiple times on the same version. You can also save prompts without running them.
You can filter data by evaluation result (including score and rationale) and agreement score when developing LLMAJ evaluators in Benchmark.
Models in the Prompting interface are now displayed in alphabetical order for easier navigation.

Evaluation

Benchmark execution runs now display detailed error messages when they fail, providing clear insights into the reasons for failure.
You can now specify custom names for benchmark runs. These names must be unique within each benchmark.
Export capabilities for benchmarks have been enhanced to include evaluator scores, evaluator information, and benchmark metadata, facilitating Gen AI system testing across different environments.

Known issues

Application

Users may experience inaccurate counts in traces batch.
Creating an LLMAJ with a previously deleted exact name may encounter issues.
Free text MSA input focus behavior may be erratic, particularly when using tab to switch between fields.

Enterprise

Users may notice discrepancies between actual file sizes and sizes displayed in the UI for datasets.

Evaluation

By default, evaluation runs on all criteria, including the ones that are not selected.

SDK

Some SDK functions are not supported in Snorkel AI Data Development Platform v25.5 and later versions. These functions will be removed in a future release. If an SDK function is no longer supported, it will be noted within the SDK reference. For assistance finding alternate approaches, please contact your Snorkel representative.

Breaking changes​

SDK Major refactor: Action required​

Features and improvements​

Prompt development​

Evaluation​

Known issues​

Application​

Enterprise​

Evaluation​

SDK​

Breaking changes

SDK Major refactor: Action required

Features and improvements

Prompt development

Evaluation

Known issues

Application

Enterprise

Evaluation

SDK