Skip to main content
Version: 0.95

Using dataset views for generative AI

Dataset views power the generative AI annotation data viewers. Snorkel Flow offers dataset view options with various benefits:

  • Single response review: To view and annotate LLM responses one by one.
  • Response ranking: To rank multiple LLM responses.

To create a dataset view

You need to create the dataset view once per dataset. You can create the dataset view using snorkelflow.client.dataset_views in the SDK.

Single response view

dataset_name = "your-dataset"
view_name = "Single Response View"
view_type="single_llm_response_view"
column_mapping={
"instruction": "questions",
"response": "responses",
"context": "rc_title_text",
"prompt_prefix": "prompt_prefix"}

sf.create_dataset_view(
dataset=dataset_name,
name=view_name,
view_type=view_type,
column_mapping=column_mapping)

Ranking view

The ranking view can scale to as many labels as desired. Once Snorkel Flow assigns a rank to a response, the response changes from white to gray. When Snorkel Flow assigns a rank, it rearranges the responses in the user interface by ascending order. The responses are grouped by the prompt and retrieved content.

dataset_name = "your-dataset"
view_name = "Ranking View"
view_type="ranking_llm_responses_view"
column_mapping={
"**instruction**": "questions",
"**response**": "responses",
"**context**": "rc_title_text",
"**prompt_prefix**": "prompt_prefix"}

# get ranking schema UID for create_dataset_view
dataset = Dataset.get(dataset_name)
for label_schema in dataset.label_schemas:
print(label_schema.name, label_schema.uid)

sf.create_dataset_view(
**dataset**=dataset_name,
**name**=view_name,
**view_type**=view_type,
**column_mapping**=column_mapping,
**label_schema_uids**=[<ranking schema id>])

To view a dataset

From within a batch, select the dataset view from the dropdown:

Single response view

Ranking view

note

To annotate on retrieved context in the view, the data for a column has to be in a valid JSON format. For example:

[
{
"TITLE": "Raise Credit Score Quickly",
"RELEVANT_CONTEXT": "Once you've paid …”
}
]

The keys (title, relevant_context) are in correctly formatted in quotes.