Skip to main content
Version: 0.94

Model zoo

The Snorkel Flow Model Zoo supports models from three major modeling libraries: Scikit-Learn, XGBoost, and HuggingFace. To see the list of models available, see model zoo).

note

The Snorkel Flow Model Zoo is only available from the in-platform notebook server.

If you want to use non-sklearn models, make sure you’ve installed relevant frameworks such as torch, transformers, and xgboost.

The model zoo contains model configurations for several commonly used models, which provide a good starting point for model training. To create a model from a model configurations, first make a deep copy of the default config for that class to avoid modifying the constant. Then, make necessary modifications to the config dictionary, such as specifying the number of classes and the fields to use as input.

If you’re not familiar with the default config, you can simply print it out to see what options are available.

# Import SKLearn logistic regression default config
from snorkelflow.models.model_configs import SKLEARN_LOGISTIC_REGRESSION_CONFIG

# Import default model options
from snorkelflow.models.utils import (
MODEL_CONFIG_OPTION_FIELDS,
MODEL_CONFIG_OPTION_N_CLASSES,
)

# Make a copy of the SKLearn logistic regression config
model_config = copy.deepcopy(SKLEARN_LOGISTIC_REGRESSION_CONFIG)

# Modify the model config to specify the fields used and number of classes
model_config.options[MODEL_CONFIG_OPTION_FIELDS] = ["text"]
model_config.options[MODEL_CONFIG_OPTION_N_CLASSES] = N_CLASSES

Use the model registry to instantiate a model from a model config.

# Import model registry
from snorkelflow.models.model_registry import get_model_from_config

model = get_model_from_config(model_config)

Next, load the Pandas DataFrame and Snorkel labels from Snorkel Flow using the appropriate training set. For more details on training data loading from Snorkel Flow, see Python SDK.

import snorkelflow.client as sf
node = sf.get_model_node(APP_NAME)
train_df = sf.get_node_data(
node=node,
split="train",
training_set_uid=1,
training_set_labels=True,
training_set_sampler_config={"strategy": "auto"},
)

Finally, you can train the instantiated model and make predictions on Pandas DataFrames using the train and predict methods in the model.

# To train a model, call `train` with the train dataframe (`df`),
# and training set labels (`Y`).

# Train model
model.train(df=train_df, Y=train_df["training_set_labels"])

# Predict on the valid set
preds, probs = model.predict(df=valid_df)

# Score on the valid set
print("Accuracy Score on Valid Set: ", accuracy_score(y_true=valid_Y["ground_truth"], y_pred=preds))

For details on registering models and predictions with Snorkel Flow, see the [Python SDK reference].