snorkelai.sdk.develop.Evaluator
- class snorkelai.sdk.develop.Evaluator(*args, **kwargs)
Bases:
BaseModel
,ABC
Base class for all evaluators.
An evaluator assesses a datapoint containing an AI application’s response against a specific criteria. Evaluators can be of two types:
CodeEvaluator: Code-based (using custom Python functions)
PromptEvaluator: Prompt-based (using LLM prompts)
The goal of an evaluator is to categorize the datapoint into one of the criteria’s labels, ultimately assigning the integer associated with the label as that datapoint’s score. An evaluator can also assign a rationale for its score, which is used to explain the score.
Read more in the Evaluation overview.
Parameters
Parameters
Name Type Default Info benchmark_uid int
The unique identifier of the benchmark that contains the criteria. The benchmark_uid
is visible in the URL of the benchmark page in the Snorkel GUI. For example,https://YOUR-SNORKEL-INSTANCE/benchmarks/100/
indicates a benchmark withbenchmark_uid
of100
.criteria_uid int
The unique identifier of the criteria that this evaluator assesses. evaluator_uid int
The unique identifier for this evaluator. - __init__(*args, **kwargs)
\_\_init\_\_
__init__
Methods
__init__
(*args, **kwargs)create
(criteria_uid, **kwargs)Creates a new evaluator for a criteria. execute
(split[, num_rows])Runs the evaluator against all datapoints in the specified dataset split. get
(evaluator_uid)Retrieves the evaluator for a given uid. get_execution_result
(execution_uid)Retrieves the evaluation results and scores for a specific execution. get_executions
()Retrieves all executions for this evaluator. get_versions
()Retrieves all version names for this evaluator. poll_execution_result
(execution_uid[, sync])Polls the evaluation job status and retrieves partial results. update
(version_name, **kwargs)Updates the evaluator with a new version. Attributes
benchmark_uid
criteria_uid
evaluator_uid
- abstract classmethod create(criteria_uid, **kwargs)
Creates a new evaluator for a criteria.
create
create
- abstract execute(split, num_rows=None, **kwargs)
Runs the evaluator against all datapoints in the specified dataset split.
execute
execute
- abstract classmethod get(evaluator_uid)
Retrieves the evaluator for a given uid.
get
get
- abstract get_execution_result(execution_uid)
Retrieves the evaluation results and scores for a specific execution.
get\_execution\_result
get_execution_result
- abstract get_executions()
Retrieves all executions for this evaluator.
Return type
Return type
List
[Dict
[str
,Any
]]
get\_executions
get_executions
- abstract get_versions()
Retrieves all version names for this evaluator.
Return type
Return type
List
[str
]
get\_versions
get_versions
- abstract poll_execution_result(execution_uid, sync=False)
Polls the evaluation job status and retrieves partial results.
poll\_execution\_result
poll_execution_result
- abstract update(version_name, **kwargs)
Updates the evaluator with a new version.
update
update
-
benchmark_uid:
int
-
criteria_uid:
int
-
evaluator_uid:
int