Skip to main content
Version: 0.95

Model training progress stages

This guide explains the training progress stages you will encounter when you initiate model training in a Snorkel Flow Application. Monitoring the training progress bar is useful as it helps you estimate the time remaining in training and understand the duration of each stage. This information allows you to efficiently manage your time, possibly engaging in other tasks or refining labeling functions while the model trains.

After you have configured your model and initiated training, the model progress bar displays various stages based on whether you are training with a new dataset or using AutoML. Once training is complete, you can view and analyze the model results.

Standard Training Progress Stages

These are the typical stages after you start training a single model in a Snorkel Flow Application.

note

If you start training a model from an existing training set, only Stages 3-6 will be applied.

Initiated model training job, waiting for resources

This stage indicates that your modeling job is in the queue and awaiting available computing resources.

Apply labeling functions to dataset

Snorkel Flow processes all your labeling functions across your training, validation, and/or development datasets. The duration of this stage depends on the number of labeling functions and the size of your datasets.

De-noise labels

The weak supervision algorithms aggregate all labeling functions and denoise them, generating a final programmatically labeled training dataset. If you have enabled the option to Train noise-aware model using probabilistic labels, this stage might take longer than usual.

Generate model features

During this phase, Snorkel Flow tokenizes and vectorizes your data to prepare it for the training phase.

Train model

The chosen model architecture is applied to the vectorized data. Updates such as Step X of Y indicate the completion of mini-batches and training updates. Complex models, like BERT, require more time compared to simpler models, like logistic regression or XGBoost.

Perform model inference

Once training is complete, Snorkel Flow runs the model on your dataset to generate predictions for evaluation. To accelerate this process, you can set Predict on dev only to True. This setting ensures that predictions are generated for only the development set, treating it as a holdout dataset for more quick error analysis.

Set Include dev split when training to False to exclude the development data from the training set, ensuring that there isn't training data leakage.

Register model + predictions

The final stage involves saving your model and predictions and calculating evaluation metrics. Although the progress bar might show 99%, completing this stage can take longer than expected, especially with large datasets or complex prediction tasks.

AutoML training process stages

When using AutoML for hyperparameter tuning, the progress stages displayed become much larger, reflecting the extensive number of stages due to the iteration of models. Each model iteration involves three main stages—featurization, training, and inference—repeated across all models, plus the final step of model registration. Label function application and denoising are conducted once at the beginning of the process. Here is an example of what you might observe during this process:

Progress Percentage Calculation

The progress percentage during AutoML training is calculated using this formula:

progress % = (current stage / (total number of stages + within stage progress)) * 100

For example, if you are currently in the model training stage, which is the fourth stage of six total stages, the initial progress would be calculated as 4/6 = 66%. As the training progresses to step 50 out of 100 steps within this stage, the calculation adjusts to reflect the intermediate progress, moving the progress calculation to 4.5/6 = 75% as it approaches the next stage.