snorkelai.sdk.develop.PromptEvaluator
- class snorkelai.sdk.develop.PromptEvaluator(*args, **kwargs)
Bases:
Evaluator
An evaluator that uses LLM prompts to assess model outputs.
This evaluator type is known as an LLM-as-a-judge (LLMAJ). A prompt evaluator uses LLM prompts to evaluate datapoints containing AI application responses, categorizing them into one of a criteria’s labels by assigning the corresponding integer score and optional rationale.
Prompt evaluator execution via the SDK is not yet supported. Please use the GUI to run prompt evaluators.
Read more about LLM-as-a-judge prompts.
- __init__(*args, **kwargs)
\_\_init\_\_
__init__
Methods
__init__
(*args, **kwargs)create
(criteria_uid, **kwargs)Creates a new evaluator for a criteria. execute
(split[, num_rows])Runs the evaluator against all datapoints in the specified dataset split. get
(evaluator_uid)Retrieves the evaluator for a given uid. get_execution_result
(execution_uid)Retrieves the evaluation results and scores for a specific execution. get_executions
()Retrieves all executions for this evaluator. get_versions
()Retrieves all version names for this evaluator. poll_execution_result
(execution_uid[, sync])Polls the evaluation job status and retrieves partial results. update
(version_name, **kwargs)Updates the evaluator with a new version. Attributes
benchmark_uid
criteria_uid
evaluator_uid
- classmethod create(criteria_uid, **kwargs)
Creates a new evaluator for a criteria.
create
create
- execute(split, num_rows=None, **kwargs)
Runs the evaluator against all datapoints in the specified dataset split.
execute
execute
- classmethod get(evaluator_uid)
Retrieves the evaluator for a given uid.
get
get
- get_execution_result(execution_uid)
Retrieves the evaluation results and scores for a specific execution.
get\_execution\_result
get_execution_result
- get_executions()
Retrieves all executions for this evaluator.
Return type
Return type
List
[Dict
[str
,Any
]]
get\_executions
get_executions
- get_versions()
Retrieves all version names for this evaluator.
Return type
Return type
List
[str
]
get\_versions
get_versions
- poll_execution_result(execution_uid, sync=False)
Polls the evaluation job status and retrieves partial results.
poll\_execution\_result
poll_execution_result