snorkelai.sdk.develop.Evaluator
- class snorkelai.sdk.develop.Evaluator(benchmark_uid, criteria_uid, evaluator_uid)
Bases:
ABC
,Base
Base class for all evaluators.
An evaluator assesses a datapoint containing an AI application’s response against a specific criteria. Evaluators can be of two types:
CodeEvaluator: Code-based (using custom Python functions)
PromptEvaluator: Prompt-based (using LLM prompts)
The goal of an evaluator is to categorize the datapoint into one of the criteria’s labels, ultimately assigning the integer associated with the label as that datapoint’s score. An evaluator can also assign a rationale for its score, which is used to explain the score.
Read more in the Evaluation overview.
Using the
Evaluator
class requires the following import:from snorkelai.sdk.develop import Evaluator
- __init__(benchmark_uid, criteria_uid, evaluator_uid)
Parameters
Parameters
Name Type Default Info benchmark_uid int
The unique identifier of the benchmark that contains the criteria. The benchmark_uid
is visible in the URL of the benchmark page in the Snorkel GUI. For example,https://YOUR-SNORKEL-INSTANCE/benchmarks/100/
indicates a benchmark withbenchmark_uid
of100
.criteria_uid int
The unique identifier of the criteria that this evaluator assesses. evaluator_uid int
The unique identifier for this evaluator.
\_\_init\_\_
__init__
Methods
__init__
(benchmark_uid, criteria_uid, ...)create
(*args, **kwargs)Creates a new evaluator for a criteria. delete
(evaluator_uid)Deletion of an evaluator is not implemented. execute
(*args, **kwargs)Runs the evaluator against all datapoints in the specified dataset split. get
(evaluator_uid)Retrieves the evaluator for a given uid. get_execution_result
(execution_uid)Retrieves the evaluation results and scores for a specific execution. get_executions
()Retrieves all executions for this evaluator. get_versions
()Retrieves all version names for this evaluator. poll_execution_result
(execution_uid[, sync])Polls the evaluation job status and retrieves partial results. update
(*args, **kwargs)Updates the evaluator with a new version. Attributes
benchmark_uid
Return the UID of the parent benchmark criteria_uid
Return the UID of the parent criteria evaluator_uid
Return the UID of the evaluator uid
Return the UID of the evaluator - abstract classmethod create(*args, **kwargs)
Creates a new evaluator for a criteria.
create
create
- classmethod delete(evaluator_uid)
Deletion of an evaluator is not implemented.
delete
delete
- abstract execute(*args, **kwargs)
Runs the evaluator against all datapoints in the specified dataset split.
execute
execute
- classmethod get(evaluator_uid)
Retrieves the evaluator for a given uid.
Parameters
Parameters
Returns
Returns
The requested evaluator object.
Return type
Return type
Name Type Default Info evaluator_uid int
The unique identifier for the evaluator. Example
evaluator = Evaluator.get(evaluator_uid=300)
get
get
- abstract get_execution_result(execution_uid)
Retrieves the evaluation results and scores for a specific execution.
get\_execution\_result
get_execution_result
- abstract get_executions()
Retrieves all executions for this evaluator.
Return type
Return type
List
[Dict
[str
,Any
]]
get\_executions
get_executions
- abstract get_versions()
Retrieves all version names for this evaluator.
Return type
Return type
List
[str
]
get\_versions
get_versions
- abstract poll_execution_result(execution_uid, sync=False)
Polls the evaluation job status and retrieves partial results.
poll\_execution\_result
poll_execution_result
- abstract update(*args, **kwargs)
Updates the evaluator with a new version.
update
update
- property benchmark_uid: int
Return the UID of the parent benchmark
- property criteria_uid: int
Return the UID of the parent criteria
- property evaluator_uid: int
Return the UID of the evaluator
- property uid: int
Return the UID of the evaluator