snorkelai.sdk.develop.BenchmarkExecution
- final class snorkelai.sdk.develop.BenchmarkExecution(benchmark_uid, benchmark_execution_uid, name, created_at, created_by, archived)
Bases:
Base
Represents a single execution run of a benchmark for a dataset.
A benchmark execution exports comprehensive evaluation data including per-datapoint scores (evaluator outputs, rationales, and ground truth agreement), slice membership, benchmark and execution metadata, including timing information and execution context.
- __init__(benchmark_uid, benchmark_execution_uid, name, created_at, created_by, archived)
Parameters
Parameters
Name Type Default Info benchmark_uid int
The unique identifier of the parent Benchmark. The benchmark_uid
is visible in the URL of the benchmark page in the Snorkel GUI. For example,https://YOUR-SNORKEL-INSTANCE/benchmarks/100/
indicates a benchmark withbenchmark_uid
of100
.benchmark_execution_uid int
The unique identifier for this execution. name str
The name of the execution. created_at datetime
Timestamp of when this execution was run. created_by str
Username of the user who ran this execution. archived bool
Whether this execution is archived.
\_\_init\_\_
__init__
Methods
__init__
(benchmark_uid, ...)create
(benchmark_uid[, name, criteria_uids, ...])Create a benchmark execution. delete
(benchmark_uid, benchmark_execution_uid)Delete (archive) a benchmark execution. export
(filepath[, config, connector_config_uid])Export information associated with this benchmark execution. get
(benchmark_uid, benchmark_execution_uid)Get a benchmark execution by its unique identifier. list
(benchmark_uid[, include_archived])List all benchmark executions for a given benchmark. update
(archived)Update the state of the benchmark execution. Attributes
archived
Return whether the benchmark execution is archived benchmark_execution_uid
Return the UID of the benchmark execution benchmark_uid
Return the UID of the parent benchmark created_at
Return the timestamp when the benchmark execution was created created_by
Return the username of the user who created the benchmark execution name
Return the name of the benchmark execution uid
Return the UID of the benchmark execution - classmethod create(benchmark_uid, name=None, criteria_uids=None, datasource_uids=None, splits=None)
Create a benchmark execution.
Parameters
Parameters
Returns
Returns
The created benchmark execution.
Return type
Return type
Name Type Default Info benchmark_uid int
The unique identifier of the benchmark to create an execution for. name Optional[str]
None
Optional name for the benchmark execution. criteria_uids Optional[List[int]]
None
List of criteria UIDs to include in the execution. datasource_uids Optional[List[int]]
None
List of datasource UIDs to include in the execution. splits Optional[List[str]]
None
List of splits to include in the execution. Example
from snorkelai.sdk.develop import BenchmarkExecution
BenchmarkExecution.create(benchmark_uid=123, name="Test Execution",datasource_uids=[1, 2, 3], splits=["train", "test"])
create
create
- classmethod delete(benchmark_uid, benchmark_execution_uid)
Delete (archive) a benchmark execution.
This performs a soft delete by archiving the benchmark execution. Hard deletion is not supported.
Parameters
Parameters
Raises
Raises
ValueError – If the benchmark execution is not found.
Return type
Return type
None
Name Type Default Info benchmark_uid int
The unique identifier of the benchmark. benchmark_execution_uid int
The unique identifier of the benchmark execution to delete. Example
from snorkelai.sdk.develop import BenchmarkExecution
BenchmarkExecution.delete(benchmark_uid=123, benchmark_execution_uid=456)
delete
delete
- export(filepath, config=None, connector_config_uid=None)
Export information associated with this benchmark execution. The exported data includes:
Benchmark metadata for the associated benchmark
Execution metadata for this execution
- Each datapoint lists its evaluation score, which includes:
The evaluator outputs
Rationale
Agreement with ground truth
Each datapoint lists its slice membership(s)
(CSV exports only) Uploaded user columns and ground truth
The export includes all datapoints without filtering or sampling. Some datapoints may have missing evaluation scores if the benchmark was not executed against them (for example, datapoints in the test split).
Parameters
Parameters
sep
: The separator between columns. Default is,
.quotechar
: The character used to quote fields. Default is"
.escapechar
: The character used to escape special characters. Default is\
.Return type
Return type
None
Name Type Default Info filepath str
The filepath where you want to write the exported data. config Union[JsonExportConfig, CsvExportConfig, None]
None
A
JsonExportConfig
orCsvExportConfig
object. Defaults to JSON. No additional configuration is required for JSON exports. For CSV exports, the following parameters are supported:connector_config_uid Optional[int]
None
Optional UID of the connector config to use for the export. Required only if the export destination is a remote, private bucket (a private S3 or GCS bucket that requires credentials). Ignored if the export destination is a public bucket (a public S3 or GCS bucket that does not require credentials) or if the export destination is a local file. Examples
Example 1
Example 1
Export a benchmark execution to a local file:
from snorkelai.sdk.develop import Benchmark
benchmark = Benchmark.get(100)
execution = benchmark.list_executions()[0]
execution.export("benchmark_execution.json")Example 2
Example 2
Export a benchmark execution to a S3 bucket using a connector config:
from snorkelai.sdk.develop import Benchmark
benchmark = Benchmark.get(100)
execution = benchmark.list_executions()[0]
execution.export("s3://MY-BUCKET/MY-PATH/benchmark_execution.json", connector_config_uid=1)
export
export
- classmethod get(benchmark_uid, benchmark_execution_uid)
Get a benchmark execution by its unique identifier.
Parameters
Parameters
Returns
Returns
The requested benchmark execution.
Return type
Return type
Raises
Raises
ValueError – If the benchmark execution is not found.
Name Type Default Info benchmark_uid int
The unique identifier of the benchmark. benchmark_execution_uid int
The unique identifier of the benchmark execution. Example
from snorkelai.sdk.develop import BenchmarkExecution
BenchmarkExecution.get(benchmark_uid=123, benchmark_execution_uid=456)
get
get
- static list(benchmark_uid, include_archived=False)
List all benchmark executions for a given benchmark.
Parameters
Parameters
Return type
Return type
List
[BenchmarkExecution
]
Name Type Default Info benchmark_uid int
The unique identifier of the parent Benchmark. The benchmark_uid
is visible in the URL of the benchmark page in the Snorkel GUI. For example,https://YOUR-SNORKEL-INSTANCE/benchmarks/100/
indicates a benchmark withbenchmark_uid
of100
.include_archived bool
False
Whether to include archived executions. Defaults to False.
list
list
- update(archived)
Update the state of the benchmark execution.
update
update
- property archived: bool
Return whether the benchmark execution is archived
- property benchmark_execution_uid: int
Return the UID of the benchmark execution
- property benchmark_uid: int
Return the UID of the parent benchmark
- property created_at: datetime
Return the timestamp when the benchmark execution was created
- property created_by: str
Return the username of the user who created the benchmark execution
- property name: str
Return the name of the benchmark execution
- property uid: int
Return the UID of the benchmark execution