Skip to main content
Version: 25.9

snorkelai.sdk.develop.Evaluator

class snorkelai.sdk.develop.Evaluator(benchmark_uid, criteria_uid, evaluator_uid)

Bases: ABC, Base

Base class for all evaluators.

An evaluator assesses a datapoint containing an AI application’s response against a specific criteria. Evaluators can be of two types:

  • CodeEvaluator: Code-based (using custom Python functions)

  • PromptEvaluator: Prompt-based (using LLM prompts)

The goal of an evaluator is to categorize the datapoint into one of the criteria’s labels, ultimately assigning the integer associated with the label as that datapoint’s score. An evaluator can also assign a rationale for its score, which is used to explain the score.

Read more in the Evaluation overview.

Using the Evaluator class requires the following import:

from snorkelai.sdk.develop import Evaluator

__init__

__init__(benchmark_uid, criteria_uid, evaluator_uid)

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the benchmark that contains the criteria. The benchmark_uid is visible in the URL of the benchmark page in the Snorkel GUI. For example, https://YOUR-SNORKEL-INSTANCE/benchmarks/100/ indicates a benchmark with benchmark_uid of 100.
criteria_uidintThe unique identifier of the criteria that this evaluator assesses.
evaluator_uidintThe unique identifier for this evaluator.

Methods

__init__(benchmark_uid, criteria_uid, ...)
create(*args, **kwargs)Creates a new evaluator for a criteria.
delete(evaluator_uid)Deletion of an evaluator is not implemented.
execute(*args, **kwargs)Runs the evaluator against all datapoints in the specified dataset split.
get(evaluator_uid)Retrieves the evaluator for a given uid.
get_execution_result(execution_uid)Retrieves the evaluation results and scores for a specific execution.
get_executions()Retrieves all executions for this evaluator.
get_versions()Retrieves all version names for this evaluator.
poll_execution_result(execution_uid[, sync])Polls the evaluation job status and retrieves partial results.
update(*args, **kwargs)Updates the evaluator with a new version.

Attributes

benchmark_uidReturn the UID of the parent benchmark
criteria_uidReturn the UID of the parent criteria
evaluator_uidReturn the UID of the evaluator
uidReturn the UID of the evaluator

create

abstract classmethod create(*args, **kwargs)

Creates a new evaluator for a criteria.

Parameters

NameTypeDefaultInfo
argsAnyParameters specific to the evaluator type.
kwargsAnyParameters specific to the evaluator type.

Return type

Evaluator

delete

classmethod delete(evaluator_uid)

Deletion of an evaluator is not implemented.

Parameters

NameTypeDefaultInfo
evaluator_uidintThe unique identifier of the evaluator.

Return type

None

execute

abstract execute(*args, **kwargs)

Runs the evaluator against all datapoints in the specified dataset split.

Parameters

NameTypeDefaultInfo
argsAnyParameters specific to the evaluator type.
kwargsAnyParameters specific to the evaluator type.

Return type

int

get

classmethod get(evaluator_uid)

Retrieves the evaluator for a given uid.

Parameters

NameTypeDefaultInfo
evaluator_uidintThe unique identifier for the evaluator.

Returns

The requested evaluator object.

Return type

Evaluator

Example

evaluator = Evaluator.get(evaluator_uid=300)

get_execution_result

abstract get_execution_result(execution_uid)

Retrieves the evaluation results and scores for a specific execution.

Parameters

NameTypeDefaultInfo
execution_uidintThe unique identifier of the execution you want to get results for.

Return type

Dict[str, Dict[str, Union[str, int, float, bool]]]

get_executions

abstract get_executions()

Retrieves all executions for this evaluator.

Return type

List[Dict[str, Any]]

get_versions

abstract get_versions()

Retrieves all version names for this evaluator.

Return type

List[str]

poll_execution_result

abstract poll_execution_result(execution_uid, sync=False)

Polls the evaluation job status and retrieves partial results.

Parameters

NameTypeDefaultInfo
execution_uidintThe unique identifier of the execution you want to poll for.
syncboolFalseWhether to wait for the job to complete.

Return type

Tuple[str, Dict[str, Dict[str, Union[str, int, float, bool]]]]

update

abstract update(*args, **kwargs)

Updates the evaluator with a new version.

Parameters

NameTypeDefaultInfo
argsAnyParameters specific to the evaluator type.
kwargsAnyParameters specific to the evaluator type.

Return type

None

property benchmark_uid: int

Return the UID of the parent benchmark

property criteria_uid: int

Return the UID of the parent criteria

property evaluator_uid: int

Return the UID of the evaluator

property uid: int

Return the UID of the evaluator