Version: 25.9

snorkelai.sdk.develop.Evaluator

class snorkelai.sdk.develop.Evaluator(benchmark_uid, criteria_uid, evaluator_uid)

Bases: ABC, Base

Base class for all evaluators.

An evaluator assesses a datapoint containing an AI application’s response against a specific criteria. Evaluators can be of two types:

CodeEvaluator: Code-based (using custom Python functions)
PromptEvaluator: Prompt-based (using LLM prompts)

The goal of an evaluator is to categorize the datapoint into one of the criteria’s labels, ultimately assigning the integer associated with the label as that datapoint’s score. An evaluator can also assign a rationale for its score, which is used to explain the score.

init

__init__(benchmark_uid, criteria_uid, evaluator_uid)

Parameters Parameters

Name	Type	Info
benchmark_uid	`int`	The unique identifier of the benchmark that contains the criteria. The `benchmark_uid` is visible in the URL of the benchmark page in the Snorkel GUI. For example, `https://YOUR-SNORKEL-INSTANCE/benchmarks/100/` indicates a benchmark with `benchmark_uid` of `100`.
criteria_uid	`int`	The unique identifier of the criteria that this evaluator assesses.
evaluator_uid	`int`	The unique identifier for this evaluator.

Methods

`__init__`(benchmark_uid, criteria_uid, ...)
`create`(args, *kwargs)	Creates a new evaluator for a criteria.
`delete`(evaluator_uid)	Deletion of an evaluator is not implemented.
`execute`(args, *kwargs)	Runs the evaluator against all datapoints in the specified dataset split.
`get`(evaluator_uid)	Retrieves the evaluator for a given uid.
`get_execution_result`(execution_uid)	Retrieves the evaluation results and scores for a specific execution.
`get_executions`()	Retrieves all executions for this evaluator.
`get_versions`()	Retrieves all version names for this evaluator.
`poll_execution_result`(execution_uid[, sync])	Polls the evaluation job status and retrieves partial results.
`update`(args, *kwargs)	Updates the evaluator with a new version.

Attributes

`benchmark_uid`	Return the UID of the parent benchmark
`criteria_uid`	Return the UID of the parent criteria
`evaluator_uid`	Return the UID of the evaluator
`uid`	Return the UID of the evaluator

create

abstract classmethod create(*args, **kwargs)

Creates a new evaluator for a criteria.

Parameters Parameters
Return type Return type: Evaluator

Name	Type	Default	Info
args	`Any`		Parameters specific to the evaluator type.
kwargs	`Any`		Parameters specific to the evaluator type.

delete

classmethod delete(evaluator_uid)

Deletion of an evaluator is not implemented.

Parameters Parameters
Return type Return type: None

Name	Type	Default	Info
evaluator_uid	`int`		The unique identifier of the evaluator.

execute

abstract execute(*args, **kwargs)

Runs the evaluator against all datapoints in the specified dataset split.

Parameters Parameters
Return type Return type: int

Name	Type	Default	Info
args	`Any`		Parameters specific to the evaluator type.
kwargs	`Any`		Parameters specific to the evaluator type.

get

classmethod get(evaluator_uid)

Retrieves the evaluator for a given uid.

Parameters Parameters
Returns Returns: The requested evaluator object.
Return type Return type: Evaluator

Name	Type	Default	Info
evaluator_uid	`int`		The unique identifier for the evaluator.

Example

evaluator = Evaluator.get(evaluator_uid=300)

get_execution_result

abstract get_execution_result(execution_uid)

Retrieves the evaluation results and scores for a specific execution.

Parameters Parameters
Return type Return type: Dict[str, Dict[str, Union[str, int, float, bool]]]

Name	Type	Default	Info
execution_uid	`int`		The unique identifier of the execution you want to get results for.

get_executions

abstract get_executions()

Retrieves all executions for this evaluator.

Return type Return type: List[Dict[str, Any]]

get_versions

abstract get_versions()

Retrieves all version names for this evaluator.

Return type Return type: List[str]

poll_execution_result

abstract poll_execution_result(execution_uid, sync=False)

Polls the evaluation job status and retrieves partial results.

Parameters Parameters
Return type Return type: Tuple[str, Dict[str, Dict[str, Union[str, int, float, bool]]]]

Name	Type	Default	Info
execution_uid	`int`		The unique identifier of the execution you want to poll for.
sync	`bool`	`False`	Whether to wait for the job to complete.

update

abstract update(*args, **kwargs)

Updates the evaluator with a new version.

Parameters Parameters
Return type Return type: None

Name	Type	Default	Info
args	`Any`		Parameters specific to the evaluator type.
kwargs	`Any`		Parameters specific to the evaluator type.

property benchmark_uid: int: Return the UID of the parent benchmark

property criteria_uid: int: Return the UID of the parent criteria

property evaluator_uid: int: Return the UID of the evaluator

property uid: int: Return the UID of the evaluator

\_\_init\_\_

__init__​

Parameters

Parameters​

create

create​

Parameters

Parameters​

Return type

Return type​

delete

delete​

Parameters

Parameters​

Return type

Return type​

execute

execute​

Parameters

Parameters​

Return type

Return type​

get

get​

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

Example​

get\_execution\_result

get_execution_result​

Parameters

Parameters​

Return type

Return type​

get\_executions

get_executions​

Return type

Return type​

get\_versions

get_versions​

Return type

Return type​

poll\_execution\_result

poll_execution_result​

Parameters

Parameters​

Return type

Return type​

update

update​

Parameters

Parameters​

Return type

Return type​

init

Parameters

create

Parameters

Return type

delete

Parameters

Return type

execute

Parameters

Return type

get

Parameters

Returns

Return type

Example

get_execution_result

Parameters

Return type

get_executions

Return type

get_versions

Return type

poll_execution_result

Parameters

Return type

update

Parameters

Return type