snorkelflow.sdk.FineTuningApp
- class snorkelflow.sdk.FineTuningApp(app_uid, model_node_uid, dataset_uid, label_schema_uid, workspace_uid, fine_tuning_app_config)
Bases:
object
- __init__(app_uid, model_node_uid, dataset_uid, label_schema_uid, workspace_uid, fine_tuning_app_config)
Methods
__init__
(app_uid, model_node_uid, ...)create
(app_name, fine_tuning_app_config)Create a new fine tuning application with the given name and configuration The dataset, application, label schema will be setup for you.
create_evaluation_report
([split, ...])Create an evaluation report for the quality dataset.
delete
()Delete the fine tuning application.
get
(application)Initialize a FineTuningApp object from an existing fine tuning application previously created with the SDK.
Get the annotation batches associated with the entire fine tuning dataset and label schema.
get_dataframe
([split, source_uids, x_uids, ...])Get data from the dataset associated with the fine tuning application with the given filters applied
get_evaluation_report
(evaluation_report_uid)Get the evaluation report associated with the given evaluation report uid.
Get the fine tuning dataset associated with the fine tuning application.
get_quality_dataset
(model_uid)Create a QualityDataset object from a trained model's predictions.
Get the sources within the current workspace.
import_data
(data, split, source_uid[, name, ...])Import data into the fine tuning application.
import_ground_truth
(gt_df, gt_column, ...[, ...])Import ground truth labels into the fine tuning dataset.
List the evaluation reports associated with the fine tuning application.
List the quality models associated with the fine tuning application.
register_custom_metric
(metric_name, metric_func)Register a user-defined metric with the FineTuningApp.
register_model_source
(model_name[, metadata])Register a model source with the given model name and metadata.
register_source
(source_name, source_type, ...)Register a source in the platform
Setup the studio for the fine tuning application.
Attributes
Get metadata about each datasource, include details about the source
- classmethod create(app_name, fine_tuning_app_config)
Create a new fine tuning application with the given name and configuration The dataset, application, label schema will be setup for you.
- Parameters:
app_name (
str
) – The name of the fine tuning applicationfine_tuning_app_config (
FineTuningAppConfig
) – The configuration of the fine tuning application
- Returns:
The fine tuning application object
- Return type:
- create_evaluation_report(split=None, quality_models=None, finetuned_model_sources=None, slices=None)
Create an evaluation report for the quality dataset.
- Parameters:
split (
Optional
[str
], default:None
) – The split of the data to evaluate (if not provided, metrics will be computed for all splits)quality_models (
Union
[List
[str
],List
[int
],None
], default:None
) – The quality models to evaluate (if not provided, the committed quality model or the most recently trained model will be used, in that order)finetuned_model_sources (
Union
[List
[str
],List
[int
],None
], default:None
) – The finetuned model sources to evaluate (if not provided, all finetuned models associated with the datasources will be used)slices (
Union
[List
[str
],List
[int
],None
], default:None
) – The slices to evaluate (if not provided, all slices in the given dataset will be evaluated)
- Returns:
A dictionary containing the evaluation results
- Return type:
Dict[str, Any]
- delete()
Delete the fine tuning application. Dataset must be deleted separately.
- Return type:
None
- classmethod get(application)
Initialize a FineTuningApp object from an existing fine tuning application previously created with the SDK.
- Parameters:
application (
Union
[str
,int
]) – The name or uid of the fine tuning application- Returns:
The fine tuning application object
- Return type:
- get_annotation_batches()
Get the annotation batches associated with the entire fine tuning dataset and label schema.
- Return type:
List
[Batch
]
- get_dataframe(split=None, source_uids=None, x_uids=None, datasource_uids=None)
Get data from the dataset associated with the fine tuning application with the given filters applied
- Parameters:
split (
Optional
[str
], default:None
) – The split of the data to getsource_uids (
Optional
[List
[int
]], default:None
) – The source uids to filter byx_uids (
Optional
[List
[str
]], default:None
) – The x uids to filter bydatasource_uids (
Optional
[List
[str
]], default:None
) – The datasource uids to filter by
- Return type:
DataFrame
- get_evaluation_report(evaluation_report_uid)
Get the evaluation report associated with the given evaluation report uid.
- Parameters:
evaluation_report_uid (
int
) – The unique identifier for the evaluation report to retrieve.- Returns:
A dictionary containing the details of the evaluation report.
- Return type:
Dict[str, Any]
- get_ft_dataset()
Get the fine tuning dataset associated with the fine tuning application.
- Returns:
The fine tuning dataset object
- Return type:
- get_quality_dataset(model_uid)
Create a QualityDataset object from a trained model’s predictions.
- Parameters:
model_uid (
int
) – The unique identifier of the trained model.- Returns:
The QualityDataset object.
- Return type:
- get_sources()
Get the sources within the current workspace.
- Returns:
A list of dictionaries containing the details of the sources.
- Return type:
List[Dict[str, Any]]
- import_data(data, split, source_uid, name=None, sync=True, refresh_datasources=True, prompt_template=None)
Import data into the fine tuning application.
- Parameters:
data (
Union
[str
,DataFrame
]) – A file path or a pandas DataFrame of the data to import into the datasetsplit (
str
) – The split of the datasource_uid (
int
) – The source to associate the data with for data lineagename (
Optional
[str
], default:None
) – The name of the data sourcesync (
bool
, default:True
) – Whether to wait for the ingestion job to complete before returningrefresh_datasources (
bool
, default:True
) – Whether to refresh datasources for the downstream model node after ingestion. Can only be set if sync is True.prompt_template (
Optional
[str
], default:None
) – The prompt template used when the data was generated.
- Returns:
The job_id of the ingestion job
- Return type:
str
Notes
If sync is set to False, the method will return immediately after submitting the ingestion job, and refresh_datasources and backfill predictions will not be performed. To ensure all post-ingestion tasks are completed, keep sync as True (default).
- import_ground_truth(gt_df, gt_column, join_column, source_uid=None, user_format=True)
Import ground truth labels into the fine tuning dataset.
- Parameters:
gt_df (
DataFrame
) – The ground truth labels DataFramegt_column (
str
) – The column in the ground truth DataFrame that contains the labelsjoin_column (
str
) – The column to join the gt_df and the fine tuning dataset on to associate the ground truth labels with the fine tuning datasetsource_uid (
Optional
[int
], default:None
) – The source uid to associate the annotations with. Defaults to the requesting user’s source uid if not set.user_format (
bool
, default:True
) – Whether the labels are in the user format or not (the label map string value vs the int value). If true, the label map will be used to convert the labels to their integer values.
- Return type:
None
- list_evaluation_reports()
List the evaluation reports associated with the fine tuning application.
- Return type:
List
[Dict
[str
,Any
]]
- list_quality_models()
List the quality models associated with the fine tuning application.
- Return type:
DataFrame
- register_custom_metric(metric_name, metric_func, overwrite=False)
Register a user-defined metric with the FineTuningApp.
- Parameters:
metric_name (
str
) – The display name of this metricmetric_func (
Callable
) – A python function to compute this metricoverwrite (
Optional
[bool
], default:False
) – Overwrite a metric of the same name if one already exists
- Returns:
id of the registered metric.
- Return type:
int
- register_model_source(model_name, metadata=None)
Register a model source with the given model name and metadata.
- Parameters:
model_name (
str
) – The name of the model.metadata (
Optional
[ModelSourceMetadata
], default:None
) – The metadata associated with the model source. If not provided, the provided model name will be used as the model name in the metadata.
- Returns:
The registered model source.
- Return type:
Dict[str, Any]
- classmethod register_source(source_name, source_type, user_uid, metadata=None)
Register a source in the platform
- Parameters:
source_name (
str
) – The name of the sourcesource_type (
SvcSourceType
) – The type of the sourceuser_uid (
Optional
[int
]) – The user uid to associate with the sourcemetadata (
Optional
[Dict
[str
,Any
]], default:None
) – The metadata to associate with the source
- Returns:
The created source
- Return type:
Dict[str, Any]
- setup_studio()
Setup the studio for the fine tuning application. This will refresh any stale datasources associated with the fine tuning application.
- Return type:
None
- property datasource_metadata: Dict[int, Any]
Get metadata about each datasource, include details about the source