Model validation best practices
This document reviews best practices and recommendations for validating your existing Snorkel Flow model. The goal of this document is to review key considerations for developing the best model validation process for your existing deployed Snorkel Flow model. This document is useful for individuals and teams who have already successfully productionalized their Snorkel Flow model and actively run inference on their specific model.
Modal validation importance
Model validation is an integral and vital part in your model development cycle. As more data is run overtime through an existing model, models often “drift” away from their previously attained success metrics (e.g., Accuracy, F1, Precision, etc.), mainly due to changes in their underlying data. Even small perturbations or shifts in your data inputted into your model can cause significant changes to the overall model success metrics. For this reason, reviewing overall model performance and assessing existing data on a consistent basis are integral to ensuring the long-term success of your model.
For most organizations, model validation occurs at a quarterly, semi-annual or annual basis, depending heavily on the severity and importance of your predicted outputs. In general, models supporting downstream tasks of high business value should be validated on a more consistent basis than models that provide lesser business value. We will dive into key considerations for formulating the right validation plan under Developing your model validation cadence, but for now want to highlight its importance to your organization.
Running model validation
Model validation requires several different steps to properly assess the risks and dependencies in your application. We recommend following the basic model validation steps however, please alter the approach based on additional tests required in your application:
- Assess baseline model metrics: Capture and review the existing baseline model and success metrics. Assess both the existing biases and shortcomings of the application, with plans to compare to your new model following validation.
- Identify out-of-time data sample: Curate dataset meticulously to use for model validation. This data should have most if not all known biases removed to ensure the data provided in validation generalizes well. The below biases are not exhaustive however, are a good starting point for general known biases in most datasets :
- Time and location: Data should be incorporated across various time stamps and locations.
- Labels: Data should strive to incorporate all or almost all labels from your existing schema.
- Randomness: Data should be selected somewhat randomly across all different data sources and data types.
- Label new out-of-time dataset to the platform: Following data curation, hand-label the previously identified dataset in Snorkel Flow. Skip this step if the dataset to be used in validation is already labeled.
- Apply LFs and train a new model: Test existing LFs using Snorkel Flow and train a new ML model after including the validation dataset in the platform.
- Evaluate the results and update the existing model: Review the results from running the ML model and compare to previous success metrics. If necessary, tweak the existing LFs and ML model to increase the overall performance.
Note: Snorkel Flow supports steps 2-5 in the above framework however, step 1 will be completed outside of Snorkel Flow.
Developing the right model validation cadence
The right validation cadence largely depends on both the severity of the data used in the project and your team’s capacity to complete periodic model validations. We generally recommend models should be validated every 6 months, but at the very least, should be validated every 12 months to ensure the long-term success of your existing model.
Choosing the right periodic validation cadence is dependent on various factors based on the overall value and importance of your model. Please use the below criteria to supplement your model validation strategy:
- Defined Success metrics: Models with stricter success criteria (e.g., F1, Accuracy, Precision, etc.) should be validated on a shorter cadence than others.
- Expected changes to underlying data: Models using data that consistently updates or changes significantly on a regular basis should be validated on a shorter cadence than others.
- Data volume: Models running inference on a significant amount of data (~100k+ datapoints) should be validated on a shorter cadence than others.
We recommend contacting and aligning your model validation strategy with your assigned Machine Learning Success Manager to engage Snorkel’s long-term support for your model validation strategy and processes.