Model training progress stages
Once you have configured your model and kicked off training, the model progress bar will display the different stages of the modeling training process. These will vary depending on if you are creating a new dataset while training a single model or if you are running AutoML.
Once your model finishes training, you can view and analyze the model results.
Standard training progress stages
If you are creating a new dataset while training a single custom or fast model, you should see the following stages:
Waiting for next available worker (position in queue: 1)
and
[Stage 0] Initiated model training job, waiting for resources {#waiting-for-next-available-worker-position-in-queue-1andstage-0-initiated-model-training-job-waiting-for-resources}
Your modeling job is in the queue, and is waiting for computing resources to get started. You'll see this if you have another job that is currently running.
[Stage 1/6] Apply labeling functions to dataset
Snorkel Flow is applying all your labeling functions across your training, valid, and/or dev datasets. The more labeling functions you have, and the larger your datasets, the longer this stage will take.
[Stage 2/6] De-noise labels
Snorkel Flow is running our best-in-class weak supervision algorithms on your labeled data, and generating a final training dataset for modeling. If you set Train noise-aware model using probabilistic labels to True, then this stage may take longer.
[Stage 3/6] Generate model features
Snorkel Flow is analyzing, tokenizing, and vectorizing your data to prepare it for model training.
[Stage 4/6] Train model
Snorkel Flow is running your chosen model architecture on the vectorized data. For some algorithms, you will see updates like Step X of Y. This refers to how many mini-batches and training updates have completed. Deep learning models like BERT will take longer than shallower models like logistic regression and XGBoost.
[Stage 5/6] Perform model inference
The model has successfully trained, and Snorkel Flow is running the model over your data to generate model predictions for evaluation. If you want to make this go faster, set Predict on dev only to True. In this case, to treat dev as a holdout dataset, also set Include dev split when training to False.
[Stage 6/6] Register model + predictions
Snorkel Flow is saving your model and predictions, and calculating evaluation screens like the Clarity matrix. Though progress will be at 99%, this can occasionally take longer than expected on large datasets with complex predictions.
AutoML training process stages
When tuning hyperparameters with AutoML, the progress messages will display a much larger number of stages, along with a sub-message that shows which model iteration is running (e.g., Model 3 of 80). The number of stages is based on the number of model iterations, with three stages (featurization, training, and inference) per iteration, plus model registration. Here is an example of what you may see:
**[Stage 8/241] Train Model
**
Model 3 of 80
Progress percentage
The progress percentage is roughly calculated as:
progress % = (current stage / (total number of stages + within stage progress)) * 100
So if you are in the model training stage (i.e., stage 4 out of 6), the job will start with 4 / 6 = 66% progress. When the model training is on step 50 / 100, the job will be halfway from stage 4 to stage 5, so progress will inch up to 75%.