Skip to main content
Version: 0.95

snorkelflow.sdk.FineTuningApp

class snorkelflow.sdk.FineTuningApp(app_uid, model_node_uid, dataset_uid, label_schema_uid, workspace_uid, fine_tuning_app_config)

Bases: object

__init__(app_uid, model_node_uid, dataset_uid, label_schema_uid, workspace_uid, fine_tuning_app_config)

Methods

__init__(app_uid, model_node_uid, ...)

create(app_name, fine_tuning_app_config)

Create a new fine tuning application with the given name and configuration The dataset, application, label schema will be setup for you.

create_evaluation_report([split, ...])

Create an evaluation report for the quality dataset.

delete()

Delete the fine tuning application.

get(application)

Initialize a FineTuningApp object from an existing fine tuning application previously created with the SDK.

get_annotation_batches()

Get the annotation batches associated with the entire fine tuning dataset and label schema.

get_dataframe([split, source_uids, x_uids, ...])

Get data from the dataset associated with the fine tuning application with the given filters applied

get_evaluation_report(evaluation_report_uid)

Get the evaluation report associated with the given evaluation report uid.

get_ft_dataset()

Get the fine tuning dataset associated with the fine tuning application.

get_quality_dataset(model_uid)

Create a QualityDataset object from a trained model's predictions.

get_sources()

Get the sources within the current workspace.

import_data(data, split, source_uid[, name, ...])

Import data into the fine tuning application.

import_ground_truth(gt_df, gt_column, ...[, ...])

Import ground truth labels into the fine tuning dataset.

list_evaluation_reports()

List the evaluation reports associated with the fine tuning application.

list_quality_models()

List the quality models associated with the fine tuning application.

register_custom_metric(metric_name, metric_func)

Register a user-defined metric with the FineTuningApp.

register_model_source(model_name[, metadata])

Register a model source with the given model name and metadata.

register_source(source_name, source_type, ...)

Register a source in the platform

setup_studio()

Setup the studio for the fine tuning application.

Attributes

datasource_metadata

Get metadata about each datasource, include details about the source

classmethod create(app_name, fine_tuning_app_config)

Create a new fine tuning application with the given name and configuration The dataset, application, label schema will be setup for you.

Parameters:
  • app_name (str) – The name of the fine tuning application

  • fine_tuning_app_config (FineTuningAppConfig) – The configuration of the fine tuning application

Returns:

The fine tuning application object

Return type:

FineTuningApp

create_evaluation_report(split=None, quality_models=None, finetuned_model_sources=None, slices=None)

Create an evaluation report for the quality dataset.

Parameters:
  • split (Optional[str], default: None) – The split of the data to evaluate (if not provided, metrics will be computed for all splits)

  • quality_models (Union[List[str], List[int], None], default: None) – The quality models to evaluate (if not provided, the committed quality model or the most recently trained model will be used, in that order)

  • finetuned_model_sources (Union[List[str], List[int], None], default: None) – The finetuned model sources to evaluate (if not provided, all finetuned models associated with the datasources will be used)

  • slices (Union[List[str], List[int], None], default: None) – The slices to evaluate (if not provided, all slices in the given dataset will be evaluated)

Returns:

A dictionary containing the evaluation results

Return type:

Dict[str, Any]

delete()

Delete the fine tuning application. Dataset must be deleted separately.

Return type:

None

classmethod get(application)

Initialize a FineTuningApp object from an existing fine tuning application previously created with the SDK.

Parameters:

application (Union[str, int]) – The name or uid of the fine tuning application

Returns:

The fine tuning application object

Return type:

FineTuningApp

get_annotation_batches()

Get the annotation batches associated with the entire fine tuning dataset and label schema.

Return type:

List[Batch]

get_dataframe(split=None, source_uids=None, x_uids=None, datasource_uids=None)

Get data from the dataset associated with the fine tuning application with the given filters applied

Parameters:
  • split (Optional[str], default: None) – The split of the data to get

  • source_uids (Optional[List[int]], default: None) – The source uids to filter by

  • x_uids (Optional[List[str]], default: None) – The x uids to filter by

  • datasource_uids (Optional[List[str]], default: None) – The datasource uids to filter by

Return type:

DataFrame

get_evaluation_report(evaluation_report_uid)

Get the evaluation report associated with the given evaluation report uid.

Parameters:

evaluation_report_uid (int) – The unique identifier for the evaluation report to retrieve.

Returns:

A dictionary containing the details of the evaluation report.

Return type:

Dict[str, Any]

get_ft_dataset()

Get the fine tuning dataset associated with the fine tuning application.

Returns:

The fine tuning dataset object

Return type:

FTDataset

get_quality_dataset(model_uid)

Create a QualityDataset object from a trained model’s predictions.

Parameters:

model_uid (int) – The unique identifier of the trained model.

Returns:

The QualityDataset object.

Return type:

QualityDataset

get_sources()

Get the sources within the current workspace.

Returns:

A list of dictionaries containing the details of the sources.

Return type:

List[Dict[str, Any]]

import_data(data, split, source_uid, name=None, sync=True, refresh_datasources=True, prompt_template=None)

Import data into the fine tuning application.

Parameters:
  • data (Union[str, DataFrame]) – A file path or a pandas DataFrame of the data to import into the dataset

  • split (str) – The split of the data

  • source_uid (int) – The source to associate the data with for data lineage

  • name (Optional[str], default: None) – The name of the data source

  • sync (bool, default: True) – Whether to wait for the ingestion job to complete before returning

  • refresh_datasources (bool, default: True) – Whether to refresh datasources for the downstream model node after ingestion. Can only be set if sync is True.

  • prompt_template (Optional[str], default: None) – The prompt template used when the data was generated.

Returns:

The job_id of the ingestion job

Return type:

str

Notes

If sync is set to False, the method will return immediately after submitting the ingestion job, and refresh_datasources and backfill predictions will not be performed. To ensure all post-ingestion tasks are completed, keep sync as True (default).

import_ground_truth(gt_df, gt_column, join_column, source_uid=None, user_format=True)

Import ground truth labels into the fine tuning dataset.

Parameters:
  • gt_df (DataFrame) – The ground truth labels DataFrame

  • gt_column (str) – The column in the ground truth DataFrame that contains the labels

  • join_column (str) – The column to join the gt_df and the fine tuning dataset on to associate the ground truth labels with the fine tuning dataset

  • source_uid (Optional[int], default: None) – The source uid to associate the annotations with. Defaults to the requesting user’s source uid if not set.

  • user_format (bool, default: True) – Whether the labels are in the user format or not (the label map string value vs the int value). If true, the label map will be used to convert the labels to their integer values.

Return type:

None

list_evaluation_reports()

List the evaluation reports associated with the fine tuning application.

Return type:

List[Dict[str, Any]]

list_quality_models()

List the quality models associated with the fine tuning application.

Return type:

DataFrame

register_custom_metric(metric_name, metric_func, overwrite=False)

Register a user-defined metric with the FineTuningApp.

Parameters:
  • metric_name (str) – The display name of this metric

  • metric_func (Callable) – A python function to compute this metric

  • overwrite (Optional[bool], default: False) – Overwrite a metric of the same name if one already exists

Returns:

id of the registered metric.

Return type:

int

register_model_source(model_name, metadata=None)

Register a model source with the given model name and metadata.

Parameters:
  • model_name (str) – The name of the model.

  • metadata (Optional[ModelSourceMetadata], default: None) – The metadata associated with the model source. If not provided, the provided model name will be used as the model name in the metadata.

Returns:

The registered model source.

Return type:

Dict[str, Any]

classmethod register_source(source_name, source_type, user_uid, metadata=None)

Register a source in the platform

Parameters:
  • source_name (str) – The name of the source

  • source_type (SvcSourceType) – The type of the source

  • user_uid (Optional[int]) – The user uid to associate with the source

  • metadata (Optional[Dict[str, Any]], default: None) – The metadata to associate with the source

Returns:

The created source

Return type:

Dict[str, Any]

setup_studio()

Setup the studio for the fine tuning application. This will refresh any stale datasources associated with the fine tuning application.

Return type:

None

property datasource_metadata: Dict[int, Any]

Get metadata about each datasource, include details about the source