Skip to main content
Version: 25.6

snorkelai.sdk.develop.Evaluator

class snorkelai.sdk.develop.Evaluator(*args, **kwargs)

Bases: BaseModel, ABC

Base class for all evaluators.

An evaluator assesses a datapoint containing an AI application’s response against a specific criteria. Evaluators can be of two types:

  • CodeEvaluator: Code-based (using custom Python functions)

  • PromptEvaluator: Prompt-based (using LLM prompts)

The goal of an evaluator is to categorize the datapoint into one of the criteria’s labels, ultimately assigning the integer associated with the label as that datapoint’s score. An evaluator can also assign a rationale for its score, which is used to explain the score.

Read more in the Evaluation overview.

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the benchmark that contains the criteria. The benchmark_uid is visible in the URL of the benchmark page in the Snorkel GUI. For example, https://YOUR-SNORKEL-INSTANCE/benchmarks/100/ indicates a benchmark with benchmark_uid of 100.
criteria_uidintThe unique identifier of the criteria that this evaluator assesses.
evaluator_uidintThe unique identifier for this evaluator.

__init__

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)
create(criteria_uid, **kwargs)Creates a new evaluator for a criteria.
execute(split[, num_rows])Runs the evaluator against all datapoints in the specified dataset split.
get(evaluator_uid)Retrieves the evaluator for a given uid.
get_execution_result(execution_uid)Retrieves the evaluation results and scores for a specific execution.
get_executions()Retrieves all executions for this evaluator.
get_versions()Retrieves all version names for this evaluator.
poll_execution_result(execution_uid[, sync])Polls the evaluation job status and retrieves partial results.
update(version_name, **kwargs)Updates the evaluator with a new version.

Attributes

benchmark_uid
criteria_uid
evaluator_uid

create

abstract classmethod create(criteria_uid, **kwargs)

Creates a new evaluator for a criteria.

Parameters

NameTypeDefaultInfo
criteria_uidintThe unique identifier of the criteria that this evaluator assesses.
**kwargsAnyAdditional parameters specific to the evaluator type.

Return type

Evaluator

execute

abstract execute(split, num_rows=None, **kwargs)

Runs the evaluator against all datapoints in the specified dataset split.

Parameters

NameTypeDefaultInfo
splitstrThe dataset split you want to evaluate.
num_rowsOptional[int]NoneThe number of rows to evaluate. If None, evaluates all rows.
**kwargsAnyAdditional parameters specific to the evaluator type.

Return type

int

get

abstract classmethod get(evaluator_uid)

Retrieves the evaluator for a given uid.

Parameters

NameTypeDefaultInfo
evaluator_uidintThe unique identifier for the evaluator.

Return type

Evaluator

get_execution_result

abstract get_execution_result(execution_uid)

Retrieves the evaluation results and scores for a specific execution.

Parameters

NameTypeDefaultInfo
execution_uidintThe unique identifier of the execution you want to get results for.

Return type

Dict[str, Dict[str, Union[str, int, float, bool]]]

get_executions

abstract get_executions()

Retrieves all executions for this evaluator.

Return type

List[Dict[str, Any]]

get_versions

abstract get_versions()

Retrieves all version names for this evaluator.

Return type

List[str]

poll_execution_result

abstract poll_execution_result(execution_uid, sync=False)

Polls the evaluation job status and retrieves partial results.

Parameters

NameTypeDefaultInfo
execution_uidintThe unique identifier of the execution you want to poll for.
syncboolFalseWhether to wait for the job to complete.

Return type

Tuple[str, Dict[str, Dict[str, Union[str, int, float, bool]]]]

update

abstract update(version_name, **kwargs)

Updates the evaluator with a new version.

Parameters

NameTypeDefaultInfo
version_namestrThe name for the new version.
**kwargsAnyAdditional parameters specific to the evaluator type.

Return type

str

benchmark_uid: int
criteria_uid: int
evaluator_uid: int