Skip to main content
Version: 25.6

snorkelai.sdk.develop.PromptEvaluator

class snorkelai.sdk.develop.PromptEvaluator(*args, **kwargs)

Bases: Evaluator

An evaluator that uses LLM prompts to assess model outputs.

This evaluator type is known as an LLM-as-a-judge (LLMAJ). A prompt evaluator uses LLM prompts to evaluate datapoints containing AI application responses, categorizing them into one of a criteria’s labels by assigning the corresponding integer score and optional rationale.

Prompt evaluator execution via the SDK is not yet supported. Please use the GUI to run prompt evaluators.

Read more about LLM-as-a-judge prompts.

__init__

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)
create(criteria_uid, **kwargs)Creates a new evaluator for a criteria.
execute(split[, num_rows])Runs the evaluator against all datapoints in the specified dataset split.
get(evaluator_uid)Retrieves the evaluator for a given uid.
get_execution_result(execution_uid)Retrieves the evaluation results and scores for a specific execution.
get_executions()Retrieves all executions for this evaluator.
get_versions()Retrieves all version names for this evaluator.
poll_execution_result(execution_uid[, sync])Polls the evaluation job status and retrieves partial results.
update(version_name, **kwargs)Updates the evaluator with a new version.

Attributes

benchmark_uid
criteria_uid
evaluator_uid

create

classmethod create(criteria_uid, **kwargs)

Creates a new evaluator for a criteria.

Parameters

NameTypeDefaultInfo
criteria_uidintThe unique identifier of the criteria that this evaluator assesses.
**kwargsAnyAdditional parameters specific to the evaluator type.

Return type

PromptEvaluator

execute

execute(split, num_rows=None, **kwargs)

Runs the evaluator against all datapoints in the specified dataset split.

Parameters

NameTypeDefaultInfo
splitstrThe dataset split you want to evaluate.
num_rowsOptional[int]NoneThe number of rows to evaluate. If None, evaluates all rows.
**kwargsDict[str, Any]Additional parameters specific to the evaluator type.

Return type

int

get

classmethod get(evaluator_uid)

Retrieves the evaluator for a given uid.

Parameters

NameTypeDefaultInfo
evaluator_uidintThe unique identifier for the evaluator.

Return type

PromptEvaluator

get_execution_result

get_execution_result(execution_uid)

Retrieves the evaluation results and scores for a specific execution.

Parameters

NameTypeDefaultInfo
execution_uidintThe unique identifier of the execution you want to get results for.

Return type

Dict[str, Dict[str, Union[str, int, float, bool]]]

get_executions

get_executions()

Retrieves all executions for this evaluator.

Return type

List[Dict[str, Any]]

get_versions

get_versions()

Retrieves all version names for this evaluator.

Return type

List[str]

poll_execution_result

poll_execution_result(execution_uid, sync=False)

Polls the evaluation job status and retrieves partial results.

Parameters

NameTypeDefaultInfo
execution_uidintThe unique identifier of the execution you want to poll for.
syncboolFalseWhether to wait for the job to complete.

Return type

Tuple[str, Dict[str, Dict[str, Union[str, int, float, bool]]]]

update

update(version_name, **kwargs)

Updates the evaluator with a new version.

Parameters

NameTypeDefaultInfo
version_namestrThe name for the new version.
**kwargsDict[str, Any]Additional parameters specific to the evaluator type.

Return type

str