Version: 25.9

snorkelai.sdk.develop.Criteria

final class snorkelai.sdk.develop.Criteria(benchmark_uid, criteria_uid, name, metric_label_schema_uid, description=None, rationale_label_schema_uid=None, archived=False)

Bases: Base

A criteria represents a specific characteristic or feature being evaluated as part of a benchmark.

Criteria define what aspects of a model or AI application’s performance are being measured, such as accuracy, relevance, safety, and other qualities. Each criteria is associated with a benchmark and has an evaluator that assesses whether a model’s output satisfies that criteria.

The heart of each criteria is its associated label schema, which defines what, exactly, the criteria is measuring, and maps each option to an integer.

For example, a criteria that measures accuracy might have a label schema that defines the following labels:

INCORRECT: 0
CORRECT: 1

A criteria that measures readability might have a label schema that defines the following labels:

POOR: 0
ACCEPTABLE: 1
EXCELLENT: 2

init

__init__(benchmark_uid, criteria_uid, name, metric_label_schema_uid, description=None, rationale_label_schema_uid=None, archived=False)

Parameters Parameters

Name	Type	Default	Info
benchmark_uid	`int`		The unique identifier of the parent Benchmark. The `benchmark_uid` is visible in the URL of the benchmark page in the Snorkel GUI. For example, `https://YOUR-SNORKEL-INSTANCE/benchmarks/100/` indicates a benchmark with `benchmark_uid` of `100`.
criteria_uid	`int`		The unique identifier for this criteria.
name	`str`		The name of the criteria.
metric_label_schema_uid	`int`		The ID of the schema defining the metric labels.
description	`Optional[str]`	`None`	A detailed description of what the criteria measures.
rationale_label_schema_uid	`Optional[int]`	`None`	The ID of the schema defining rationale labels (if applicable).
archived	`bool`	`False`	Whether the criteria is archived.

Examples

Using the Criteria class requires the following import:

from snorkelai.sdk.develop import Criteria

Create a new criteria:

# Create a new criteria
criteria = Criteria.create(
    benchmark_uid=100,
    name="Accuracy",
    description="Measures response accuracy",
    label_map={"Correct": 1, "Incorrect": 0},
    requires_rationale=True
)

Get an existing criteria:

# Get existing criteria
criteria = Criteria.get(criteria_uid=100)

Methods

`__init__`(benchmark_uid, criteria_uid, name, ...)
`archive`()	Archives the criteria, hiding it from the UI and Benchmark.list_criteria method.
`create`(benchmark_uid, name, label_map[, ...])	Create a new criteria for a benchmark.
`delete`(criteria_uid)	Deletion of a criteria is not implemented.
`get`(criteria_uid)	Get an existing criteria by its UID.
`get_evaluator`()	Retrieves the evaluator associated with this criteria.
`update`([name, description, archived])	Updates the criteria with the given parameters.

Attributes

`archived`	Return whether the criteria is archived
`benchmark_uid`	Return the UID of the parent benchmark
`criteria_uid`	Return the UID of the criteria
`description`	Return the description of the criteria
`metric_label_schema_uid`	Return the UID of the metric label schema
`name`	Return the name of the criteria
`rationale_label_schema_uid`	Return the UID of the rationale label schema
`uid`	Return the UID of the criteria

create

static create(benchmark_uid, name, label_map, description=None, requires_rationale=False)

Create a new criteria for a benchmark.

Your label_map must use consecutive integers starting from 0. For example, if you have three labels, you must use the values 0, 1, and 2.

Parameters Parameters
Returns Returns: A new Criteria object representing the created criteria.
Return type Return type: Criteria
Raises Raises: ValueError – If label_map is empty or has invalid values.

Name	Type	Default	Info
benchmark_uid	`int`		The unique identifier of the parent Benchmark.
name	`str`		The name of the criteria.
label_map	`Dict[str, int]`		A dictionary mapping user-friendly labels to numeric values. The key “UNKNOWN” will always be added with value -1. Dictionary values must be consecutive integers starting from 0.
description	`Optional[str]`	`None`	A detailed description of what the criteria measures.
requires_rationale	`bool`	`False`	Whether the criteria requires rationale.

Example

criteria = Criteria.create(
    benchmark_uid=200,
    name="Accuracy",
    description="Measures response accuracy",
    label_map={"Correct": 1, "Incorrect": 0},
    requires_rationale=True
)

delete

classmethod delete(criteria_uid)

Deletion of a criteria is not implemented.

Parameters Parameters
Return type Return type: None

Name	Type	Default	Info
criteria_uid	`int`		The unique identifier of the criteria.

get

static get(criteria_uid)

Get an existing criteria by its UID.

Parameters Parameters
Returns Returns: A Criteria object representing the existing criteria.
Return type Return type: Criteria
Raises Raises: ValueError – If the criteria is not found.

Name	Type	Default	Info
criteria_uid	`int`		The unique identifier for the criteria.

Example

criteria = Criteria.get(criteria_uid=100)

get_evaluator

get_evaluator()

Retrieves the evaluator associated with this criteria.

An evaluator is a prompt or code snippet that assesses whether a model’s output satisfies the criteria. Each criteria has one evaluator that assesses each datapoint against the criteria’s label schema and chooses the most appropriate label, in the form of the associated integer.

The evaluator can be either a code evaluator (using custom Python functions) or a prompt evaluator (using LLM prompts).

Raises Raises: IndexError – If no evaluator is found for this criteria.
Return type Return type: Evaluator

Example

Example 1

Get the evaluator for a criteria and check its type:

from snorkelai.sdk.develop import PromptEvaluator, CodeEvaluator

criteria = Criteria.get(criteria_uid=100)
evaluator = criteria.get_evaluator()

if isinstance(evaluator, CodeEvaluator):
    print("This is a code evaluator")
elif isinstance(evaluator, PromptEvaluator):
    print("This is a prompt evaluator")

update

update(name=None, description=None, archived=None)

Updates the criteria with the given parameters. If a parameter is not provided or is None, the existing value will be left unchanged.

Parameters Parameters
Return type Return type: None

Name	Type	Default	Info
name	`Optional[str]`	`None`	The name of the criteria.
description	`Optional[str]`	`None`	A detailed description of what the criteria measures.
archived	`Optional[bool]`	`None`	Whether the criteria is archived.

Example

criteria = Criteria.get(criteria_uid=100)
criteria.update(name="New Name", description="New description")

property archived: bool: Return whether the criteria is archived

property benchmark_uid: int: Return the UID of the parent benchmark

property criteria_uid: int: Return the UID of the criteria

property description: str | None: Return the description of the criteria

property metric_label_schema_uid: int: Return the UID of the metric label schema

property name: str: Return the name of the criteria

property rationale_label_schema_uid: int | None: Return the UID of the rationale label schema

property uid: int: Return the UID of the criteria

\_\_init\_\_

__init__​

Parameters

Parameters​

Examples​

archive

archive​

Return type

Return type​

create

create​

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

Raises

Raises​

Example​

delete

delete​

Parameters

Parameters​

Return type

Return type​

get

get​

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

Raises

Raises​

Example​

get\_evaluator

get_evaluator​

Raises

Raises​

Return type

Return type​

Example​

Example 1

Example 1​

update

update​

Parameters

Parameters​

Return type

Return type​

Example​

init

Parameters

Examples

archive

Return type

create

Parameters

Returns

Return type

Raises

Example

delete

Parameters

Return type

get

Parameters

Returns

Return type

Raises

Example

get_evaluator

Raises

Return type

Example

Example 1

update

Parameters

Return type

Example