Skip to main content
Version: 25.9

snorkelai.sdk.develop.BenchmarkExecution

final class snorkelai.sdk.develop.BenchmarkExecution(benchmark_uid, benchmark_execution_uid, name, created_at, created_by, archived)

Bases: Base

Represents a single execution run of a benchmark for a dataset.

A benchmark execution exports comprehensive evaluation data including per-datapoint scores (evaluator outputs, rationales, and ground truth agreement), slice membership, benchmark and execution metadata, including timing information and execution context.

__init__

__init__(benchmark_uid, benchmark_execution_uid, name, created_at, created_by, archived)

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the parent Benchmark. The benchmark_uid is visible in the URL of the benchmark page in the Snorkel GUI. For example, https://YOUR-SNORKEL-INSTANCE/benchmarks/100/ indicates a benchmark with benchmark_uid of 100.
benchmark_execution_uidintThe unique identifier for this execution.
namestrThe name of the execution.
created_atdatetimeTimestamp of when this execution was run.
created_bystrUsername of the user who ran this execution.
archivedboolWhether this execution is archived.

Methods

__init__(benchmark_uid, ...)
create(benchmark_uid[, name, criteria_uids, ...])Create a benchmark execution.
delete(benchmark_uid, benchmark_execution_uid)Delete (archive) a benchmark execution.
export(filepath[, config, connector_config_uid])Export information associated with this benchmark execution.
get(benchmark_uid, benchmark_execution_uid)Get a benchmark execution by its unique identifier.
list(benchmark_uid[, include_archived])List all benchmark executions for a given benchmark.
update(archived)Update the state of the benchmark execution.

Attributes

archivedReturn whether the benchmark execution is archived
benchmark_execution_uidReturn the UID of the benchmark execution
benchmark_uidReturn the UID of the parent benchmark
created_atReturn the timestamp when the benchmark execution was created
created_byReturn the username of the user who created the benchmark execution
nameReturn the name of the benchmark execution
uidReturn the UID of the benchmark execution

create

classmethod create(benchmark_uid, name=None, criteria_uids=None, datasource_uids=None, splits=None)

Create a benchmark execution.

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the benchmark to create an execution for.
nameOptional[str]NoneOptional name for the benchmark execution.
criteria_uidsOptional[List[int]]NoneList of criteria UIDs to include in the execution.
datasource_uidsOptional[List[int]]NoneList of datasource UIDs to include in the execution.
splitsOptional[List[str]]NoneList of splits to include in the execution.

Returns

The created benchmark execution.

Return type

BenchmarkExecution

Example

from snorkelai.sdk.develop import BenchmarkExecution
BenchmarkExecution.create(benchmark_uid=123, name="Test Execution",datasource_uids=[1, 2, 3], splits=["train", "test"])

delete

classmethod delete(benchmark_uid, benchmark_execution_uid)

Delete (archive) a benchmark execution.

This performs a soft delete by archiving the benchmark execution. Hard deletion is not supported.

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the benchmark.
benchmark_execution_uidintThe unique identifier of the benchmark execution to delete.

Raises

ValueError – If the benchmark execution is not found.

Return type

None

Example

from snorkelai.sdk.develop import BenchmarkExecution
BenchmarkExecution.delete(benchmark_uid=123, benchmark_execution_uid=456)

export

export(filepath, config=None, connector_config_uid=None)

Export information associated with this benchmark execution. The exported data includes:

  • Benchmark metadata for the associated benchmark

  • Execution metadata for this execution

  • Each datapoint lists its evaluation score, which includes:
    • The evaluator outputs

    • Rationale

    • Agreement with ground truth

  • Each datapoint lists its slice membership(s)

  • (CSV exports only) Uploaded user columns and ground truth

The export includes all datapoints without filtering or sampling. Some datapoints may have missing evaluation scores if the benchmark was not executed against them (for example, datapoints in the test split).

Parameters

NameTypeDefaultInfo
filepathstrThe filepath where you want to write the exported data.
configUnion[JsonExportConfig, CsvExportConfig, None]None

A JsonExportConfig or CsvExportConfig object. Defaults to JSON. No additional configuration is required for JSON exports. For CSV exports, the following parameters are supported:

  • sep: The separator between columns. Default is ,.

  • quotechar: The character used to quote fields. Default is ".

  • escapechar: The character used to escape special characters. Default is \.

connector_config_uidOptional[int]NoneOptional UID of the connector config to use for the export. Required only if the export destination is a remote, private bucket (a private S3 or GCS bucket that requires credentials). Ignored if the export destination is a public bucket (a public S3 or GCS bucket that does not require credentials) or if the export destination is a local file.

Return type

None

Examples

Example 1

Export a benchmark execution to a local file:

from snorkelai.sdk.develop import Benchmark

benchmark = Benchmark.get(100)
execution = benchmark.list_executions()[0]
execution.export("benchmark_execution.json")

Example 2

Export a benchmark execution to a S3 bucket using a connector config:

from snorkelai.sdk.develop import Benchmark

benchmark = Benchmark.get(100)
execution = benchmark.list_executions()[0]
execution.export("s3://MY-BUCKET/MY-PATH/benchmark_execution.json", connector_config_uid=1)

get

classmethod get(benchmark_uid, benchmark_execution_uid)

Get a benchmark execution by its unique identifier.

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the benchmark.
benchmark_execution_uidintThe unique identifier of the benchmark execution.

Returns

The requested benchmark execution.

Return type

BenchmarkExecution

Raises

ValueError – If the benchmark execution is not found.

Example

from snorkelai.sdk.develop import BenchmarkExecution

BenchmarkExecution.get(benchmark_uid=123, benchmark_execution_uid=456)

list

static list(benchmark_uid, include_archived=False)

List all benchmark executions for a given benchmark.

Parameters

NameTypeDefaultInfo
benchmark_uidintThe unique identifier of the parent Benchmark. The benchmark_uid is visible in the URL of the benchmark page in the Snorkel GUI. For example, https://YOUR-SNORKEL-INSTANCE/benchmarks/100/ indicates a benchmark with benchmark_uid of 100.
include_archivedboolFalseWhether to include archived executions. Defaults to False.

Return type

List[BenchmarkExecution]

update

update(archived)

Update the state of the benchmark execution.

Parameters

NameTypeDefaultInfo
archivedboolWhether the benchmark execution should be archived.

Return type

None

property archived: bool

Return whether the benchmark execution is archived

property benchmark_execution_uid: int

Return the UID of the benchmark execution

property benchmark_uid: int

Return the UID of the parent benchmark

property created_at: datetime

Return the timestamp when the benchmark execution was created

property created_by: str

Return the username of the user who created the benchmark execution

property name: str

Return the name of the benchmark execution

property uid: int

Return the UID of the benchmark execution