Skip to main content
Version: 25.7

snorkelai.sdk.develop.Criteria

class snorkelai.sdk.develop.Criteria(*args, **kwargs)

Bases: BaseModel

A criteria represents a specific characteristic or feature being evaluated as part of a benchmark.

Criteria define what aspects of a model or AI application’s performance are being measured, such as accuracy, relevance, safety, and other qualities. Each criteria is associated with a benchmark and has an evaluator that assesses whether a model’s output satisfies that criteria.

The heart of each criteria is its associated label schema, which defines what, exactly, the criteria is measuring, and maps each option to an integer.

For example, a criteria that measures accuracy might have a label schema that defines the following labels:

  • INCORRECT: 0

  • CORRECT: 1

A criteria that measures readability might have a label schema that defines the following labels:

  • POOR: 0

  • ACCEPTABLE: 1

  • EXCELLENT: 2

Read more in the Evaluation overview.

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the parent Benchmark. The benchmark_uid is visible in the URL of the benchmark page in the Snorkel GUI. For example, https://YOUR-SNORKEL-INSTANCE/benchmarks/100/ indicates a benchmark with benchmark_uid of 100.
criteria_uidintThe unique identifier for this criteria.
namestrThe name of the criteria.
metric_label_schema_uidintThe ID of the schema defining the metric labels.
descriptionstr""A detailed description of what the criteria measures.
rationale_label_schema_uidOptional[int], default=NoneThe ID of the schema defining rationale labels (if applicable).

Examples

Using the Criteria class requires the following import:

from snorkelai.sdk.develop import Criteria

Create a new criteria:

# Create a new criteria
criteria = Criteria.create(
benchmark_uid=100,
name="Accuracy",
description="Measures response accuracy",
label_map={"Correct": 1, "Incorrect": 0},
requires_rationale=True
)

Get an existing criteria:

# Get existing criteria
criteria = Criteria.get(criteria_uid=100)

__init__

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)
archive()Archives the criteria, hiding it from the UI and Benchmark.list_criteria method.
create(benchmark_uid, name, label_map[, ...])Create a new criteria for a benchmark.
get(criteria_uid)Get an existing criteria by its UID.
get_evaluator()Retrieves the evaluator associated with this criteria.
update([name, description, archived])Updates the criteria with the given parameters.

Attributes

archived
description
rationale_label_schema_uid
benchmark_uid
criteria_uid
name
metric_label_schema_uid

archive

archive()

Archives the criteria, hiding it from the UI and Benchmark.list_criteria method.

Use snorkelai.sdk.develop.benchmarks.Benchmark.list_criteria() with include_archived=True to view archived criteria.

Return type

None

create

static create(benchmark_uid, name, label_map, description=None, requires_rationale=False)

Create a new criteria for a benchmark.

Your label_map must use consecutive integers starting from 0. For example, if you have three labels, you must use the values 0, 1, and 2.

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the parent Benchmark.
namestrThe name of the criteria.
label_mapDict[str, int]A dictionary mapping user-friendly labels to numeric values. The key “UNKNOWN” will always be added with value -1. Dictionary values must be consecutive integers starting from 0.
descriptionstr, default=NoneA detailed description of what the criteria measures.
requires_rationalebool, default=FalseWhether the criteria requires rationale.

Returns

A new Criteria object representing the created criteria.

Return type

Criteria

Raises

ValueError – If label_map is empty or has invalid values.

Example

criteria = Criteria.create(
benchmark_uid=200,
name="Accuracy",
description="Measures response accuracy",
label_map={"Correct": 1, "Incorrect": 0},
requires_rationale=True
)

get

static get(criteria_uid)

Get an existing criteria by its UID.

Parameters

NameTypeDefaultInfo
criteria_uidintThe unique identifier for the criteria.

Returns

A Criteria object representing the existing criteria.

Return type

Criteria

Raises

ValueError – If the criteria is not found.

Example

criteria = Criteria.get(criteria_uid=100)

get_evaluator

get_evaluator()

Retrieves the evaluator associated with this criteria.

An evaluator is a prompt or code snippet that assesses whether a model’s output satisfies the criteria. Each criteria has one evaluator that assesses each datapoint against the criteria’s label schema and chooses the most appropriate label, in the form of the associated integer.

The evaluator can be either a code evaluator (using custom Python functions) or a prompt evaluator (using LLM prompts).

Raises

IndexError – If no evaluator is found for this criteria.

Return type

Evaluator

Example

Example 1

Get the evaluator for a criteria and check its type:

from snorkelai.sdk.develop import PromptEvaluator, CodeEvaluator

criteria = Criteria.get(criteria_uid=100)
evaluator = criteria.get_evaluator()

if isinstance(evaluator, CodeEvaluator):
print("This is a code evaluator")
elif isinstance(evaluator, PromptEvaluator):
print("This is a prompt evaluator")

update

update(name=None, description=None, archived=None)

Updates the criteria with the given parameters. If a parameter is not provided or is None, the existing value will be left unchanged.

Parameters

NameTypeDefaultInfo
namestr, default=NoneThe name of the criteria.
descriptionstr, default=NoneA detailed description of what the criteria measures.
archivedbool, default=NoneWhether the criteria is archived.

Returns

The updated Criteria object.

Return type

Criteria

Example

criteria = Criteria.get(criteria_uid=100)
criteria.update(name="New Name", description="New description")
archived: bool = False
benchmark_uid: int
criteria_uid: int
description: Optional[str] = None
metric_label_schema_uid: int
name: str
rationale_label_schema_uid: Optional[int] = None