Version: 25.6

snorkelai.sdk.develop.Evaluator

class snorkelai.sdk.develop.Evaluator(*args, **kwargs)

Bases: BaseModel, ABC

Base class for all evaluators.

An evaluator assesses a datapoint containing an AI application’s response against a specific criteria. Evaluators can be of two types:

CodeEvaluator: Code-based (using custom Python functions)
PromptEvaluator: Prompt-based (using LLM prompts)

The goal of an evaluator is to categorize the datapoint into one of the criteria’s labels, ultimately assigning the integer associated with the label as that datapoint’s score. An evaluator can also assign a rationale for its score, which is used to explain the score.

Read more in the Evaluation overview.

Parameters Parameters

Name	Type	Info
benchmark_uid	`int`	The unique identifier of the benchmark that contains the criteria. The `benchmark_uid` is visible in the URL of the benchmark page in the Snorkel GUI. For example, `https://YOUR-SNORKEL-INSTANCE/benchmarks/100/` indicates a benchmark with `benchmark_uid` of `100`.
criteria_uid	`int`	The unique identifier of the criteria that this evaluator assesses.
evaluator_uid	`int`	The unique identifier for this evaluator.

__init__(*args, **kwargs)

Methods

`__init__`(args, *kwargs)
`create`(criteria_uid, **kwargs)	Creates a new evaluator for a criteria.
`execute`(split[, num_rows])	Runs the evaluator against all datapoints in the specified dataset split.
`get`(evaluator_uid)	Retrieves the evaluator for a given uid.
`get_execution_result`(execution_uid)	Retrieves the evaluation results and scores for a specific execution.
`get_executions`()	Retrieves all executions for this evaluator.
`get_versions`()	Retrieves all version names for this evaluator.
`poll_execution_result`(execution_uid[, sync])	Polls the evaluation job status and retrieves partial results.
`update`(version_name, **kwargs)	Updates the evaluator with a new version.

Attributes

`benchmark_uid`
`criteria_uid`
`evaluator_uid`

create

abstract classmethod create(criteria_uid, **kwargs)

Creates a new evaluator for a criteria.

Parameters Parameters
Return type Return type: Evaluator

Name	Type	Default	Info
criteria_uid	`int`		The unique identifier of the criteria that this evaluator assesses.
**kwargs	`Any`		Additional parameters specific to the evaluator type.

execute

abstract execute(split, num_rows=None, **kwargs)

Runs the evaluator against all datapoints in the specified dataset split.

Parameters Parameters
Return type Return type: int

Name	Type	Default	Info
split	`str`		The dataset split you want to evaluate.
num_rows	`Optional[int]`	`None`	The number of rows to evaluate. If `None`, evaluates all rows.
**kwargs	`Any`		Additional parameters specific to the evaluator type.

get

abstract classmethod get(evaluator_uid)

Retrieves the evaluator for a given uid.

Parameters Parameters
Return type Return type: Evaluator

Name	Type	Default	Info
evaluator_uid	`int`		The unique identifier for the evaluator.

get_execution_result

abstract get_execution_result(execution_uid)

Retrieves the evaluation results and scores for a specific execution.

Parameters Parameters
Return type Return type: Dict[str, Dict[str, Union[str, int, float, bool]]]

Name	Type	Default	Info
execution_uid	`int`		The unique identifier of the execution you want to get results for.

get_executions

abstract get_executions()

Retrieves all executions for this evaluator.

Return type Return type: List[Dict[str, Any]]

get_versions

abstract get_versions()

Retrieves all version names for this evaluator.

Return type Return type: List[str]

poll_execution_result

abstract poll_execution_result(execution_uid, sync=False)

Polls the evaluation job status and retrieves partial results.

Parameters Parameters
Return type Return type: Tuple[str, Dict[str, Dict[str, Union[str, int, float, bool]]]]

Name	Type	Default	Info
execution_uid	`int`		The unique identifier of the execution you want to poll for.
sync	`bool`	`False`	Whether to wait for the job to complete.

update

abstract update(version_name, **kwargs)

Updates the evaluator with a new version.

Parameters Parameters
Return type Return type: str

Name	Type	Default	Info
version_name	`str`		The name for the new version.
**kwargs	`Any`		Additional parameters specific to the evaluator type.

benchmark_uid: int

criteria_uid: int

evaluator_uid: int

Parameters

Parameters​

\_\_init\_\_

__init__​

create

create​

Parameters

Parameters​

Return type

Return type​

execute

execute​

Parameters

Parameters​

Return type

Return type​

get

get​

Parameters

Parameters​

Return type

Return type​

get\_execution\_result

get_execution_result​

Parameters

Parameters​

Return type

Return type​

get\_executions

get_executions​

Return type

Return type​

get\_versions

get_versions​

Return type

Return type​

poll\_execution\_result

poll_execution_result​

Parameters

Parameters​

Return type

Return type​

update

update​

Parameters

Parameters​

Return type

Return type​

Parameters

init

create

Parameters

Return type

execute

Parameters

Return type

get

Parameters

Return type

get_execution_result

Parameters

Return type

get_executions

Return type

get_versions

Return type

poll_execution_result

Parameters

Return type

update

Parameters

Return type